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Abstract 

We are investigating how to program robots so that they learn from experience. 
Our goal is to develop principled methods of learning that can improve a robot's 
performance of a wide range of dynamic tasks. Our interest is in complex tasks 
such as throwing, catching, batting, yo-yoing, and juggling. We have developed 
one method of learning, task-level learning, that successfully improves a robot's 
performance of both a ball-throwing and a juggling task. 

With task-level learning, a robot practices a task, monitors its own perfor- 
mance, and uses that experience to adjust its task-level commands. For example, 
we have programmed a robot to juggle a single ball in three dimensions. The 
robot practices the juggling task by batting a ball into the air with a large paddle. 
The robot uses a real-time binary vision system to track the ball and measure its 
own performance. Task-level learning consists of building a model of the perfor- 
mance errors at the task level during practice. The robot compensates for the 
performance errors by using that model to refine the task-level commands. When 
using task-level learning, the number of hits that the robot can execute before the 
ball is hit out of range dramatically improves. 

Task-level learning is a general method of improving a robot's performance 
of complex dynamic tasks. Task-level learning serves to complement other ap- 
proaches for improving robot performance such as model calibration. Our inves- 
tigation is one step in the process of developing a theoretical and experimental 
foundation for robot learning. 
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Chapter 1 
Introduction 



We are investigating how to program robots so that they learn from experience. 
Our premise is that robots can practice a task and use that experience to improve 
performance. We demonstrate that this premise is valid for two robot tasks — 
throwing and juggling — and suggest that the performance of other dynamic tasks 
can be improved by learning from practice. 

We base our learning approach on a rather commonplace observation. A person 
throwing a ball for the first time at a target will often miss. If the ball is thrown 
too short, the person will aim a little further and throw again. If the ball is thrown 
too far, the person will aim a little closer. From this simple example, we notice 
that people tend to vary the aim to compensate for the error in performance. We 
also observe that people are quite willing to practice the task until they finally 
succeed. We thus pose the question, why not use this approach for a robot? 

We develop task-level learning procedures that attempt to mimic the approach 
people tend to take. The learning procedures formalize the process of correcting 
for the errors that occur when a robot performs a task. The procedures require 
that a robot system practices a task, monitors its own performance, and adjusts 
its commands until the task is performed correctly. A model is used to translate 
each new task-level command into actuator commands that drive the robot. 

The learning procedures are called task-level because they directly refine task- 
level commands, not the low-level actuator commands that drive the robot. In 
ball-throwing, the learning procedures improve the performance on the task by 
adjusting where the robot system is aiming the ball. The model that transforms 
this aim into actuator commands is not adjusted. No model calibration is per- 
formed during learning. Instead, the aim is adjusted and the model is used to 
automatically recompute the low-level actuator commands that drive the robot. 
By judiciously adjusting the robot's task-level aim, the learning procedures com- 
pensate for inaccuracies in the model and improve the robot's performance of the 
task. 
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Learning at the task level is a promising method of improving a robot's per- 
formance of a task. First, less data is necessary to refine the task-level command 
than to perform extensive model calibration. Instead of making a large number of 
trial motions to calibrate the system, trial motions that actually attempt the task 
provide a more concise method of achieving the task goal. Second, learning at the 
task level reduces the degrees of freedom of the models to be learned. Instead of 
adjusting all the parameters of the kinematics, dynamics, actuation, and sensing 
models of a robot, only the task-level commands of the system are modified. In 
making the task-level adjustments, the learning procedures compensate for the 
structural modeling errors in the lower level component models of the robot. Ul- 
timately, task-level learning and other types of model calibration can probably be 
used simultaneously to improve performance. 

Thesis Outline 

In Chapter 2 we develop two learning procedures in the context of a throwing 
task. In this task, the third link of a robot is used to catapult a ball at a target. 
The robot aims at a target that is a known distance away. A model of ballistics, 
kinematics, and dynamics is used to calculate the actuator commands that should 
drive the robot arm to swing forward, lofting the ball onto the target. The robot 
throws the ball with this sequence of commands. When the ball hits the target 
area, the robot uses a camera to monitor the exact landing location of the ball. 
The robot uses this measure of its own performance — how closely the ball landed 
to the target — to change its aim for the next throw. 

After discussing how the robot improved its performance of the throwing task, 
we analyze and generalize the learning procedures. We first describe the task 
model that is used to represent the ball-throwing task. We explain how the 
throwing aim is modified to perform the task. We elaborate on the conditions 
required for two different task-level learning procedures to converge to the desired 
performance. We demonstrate that accurate internal models of the robot improve 
the speed of the learning process. We then extend the learning algorithms to 
multi-dimensional tasks. 

In Chapter 3 we demonstrate the effectiveness of task-level learning on a com- 
plex task — juggling. In this task the robot bounces a ball on a paddle. A vision 
system tracks a tennis ball in the air and estimates the time and location at which 
it should be hit. Based on this estimate, the robot system computes a sequence of 
actuator commands that drive the paddle with the proper upward motion. After 
the paddle hits the ball during the swing, the robot system tracks the motion of 
the ball, monitors its performance, and prepares for the next hit. For each hit, 
the robot tries to hit the ball so that it will land at the center of the paddle on 
the next bounce. 
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We describe a sequence of task-level learning experiments on the juggling sys- 
tem. A task model is first used to describe the juggling task. With this task 
model and the learning algorithms, the system learns to perform the first hit in a 
juggling sequence. The robot system then learns to successfully perform two con- 
secutive hits. The juggling system finally uses a task-level, state-based learning 
algorithm to successfully hit the ball more than 70 times in a row. At each step, 
the performance of the juggling system is dramatically improved when task-level 
learning algorithms are applied. 

In Chapter 4 we explain how other researchers have improved the performance 
of robot systems. We survey a variety of recent descriptions of robot systems, 
analyzing the approaches researchers took to improve system performance. We 
discuss calibration approaches which improve the accuracy of the models used to 
control the robot. We also discuss iterative and feedback control schemes that are 
similar to the task-level learning procedures that we develop. Finally, we devote a 
section to identifying the tradeoffs between calibration and learning approaches. 

Thesis Goals 

We are pursuing two specific goals with this research. The first is to develop some 
general learning principles that improve a robot's performance of a wide range of 
tasks, of which throwing and juggling are examples. A second goal is to explore 
learning at the task level, a method that is complementary to extensive component 
model calibration. Both goals are explored by developing and implementing task- 
level learning on two robot tasks — throwing and juggling. 



Chapter 2 

Learning the Ball-Throwing Task 



In this chapter, we present a theoretical and experimental framework for task- 
level learning. This learning research is developed in the context of a robot ball- 
throwing task. We begin by describing a ball-throwing task that a robot system 
performs. We represent the task and system with a task model. Two learning 
procedures are then developed and applied to improve the performance of the ball- 
throwing system. We present experimental results, and assess the convergence 
and performance of each learning procedure. Finally, we generalize the task-level 
learning procedures so that they can be applied to more complicated tasks. 

2.1 Introduction 

We have demonstrated the process of task-level learning with a ball-throwing 
robot system that improves its performance with practice (Figure 2.1). Given the 
location of the target, the system uses a ballistic model, a kinematic model, a tra- 
jectory model, and a dynamics model to calculate a sequence of torque commands 
to drive the robot arm. Using this set of actuator commands and a simple feed- 
back controller, the robot throws the ball at the target. A vision system measures 
where the ball lands with respect to the target. Based on the error in performance, 
the system applies a task-level learning procedure to modify its aim — where it is 
trying to throw the ball. With this new aim and the same models of ballistics, 
kinematics, trajectories, and dynamics, the system computes a new set of com- 
mands to drive the arm. The robot system then throws the ball again with the 
updated sequence of actuator commands. The robot continues this sequence of 
performing the task, monitoring its own performance, and refining its aim until 
the task is successfully performed. 

The task-level learning procedure modifies the system's aim based on the error 
in task performance. The learning procedure is based on our observation that 
a person adjusts his aim to compensate for errors in performance. If the first 
throw lands further than the target, the robot system will aim closer. If the first 
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Figure 2.1: Ball-Throwing Robot System 

throw lands too close, the robot system will aim further. We have developed 
two learning algorithms — fixed model learning and refined model learning — that 
adjust the system's aim to compensate for errors in performance. Both learning 
procedures improve the performance of the ball-throwing robot system. 

2.2 The Ball Throwing Task 

The task is to throw a ball at a target. Figure 2.1 illustrates the robot system 
that is configured to accomplish this task. The system includes the MIT Serial 
Link Direct Drive Arm [An, Atkeson, and Hollerbach 1988], a target plate, and a 
video camera. 

The last link of the robot is used to throw the ball at the target plate. The 
robot is positioned so that the last link of the arm rotates in a vertical plane. A 
0.04 m rubber ball is placed into a 0.035 m diameter hole at the end of the third 
link of the robot. The robot swings this link through a 180° arc, catapulting the 
ball to the target. The ball leaves the hole as the robot arm decelerates during 
the throw. No release mechanism is used, but the release position is assumed to 
be when the last link is approximately halfway through the trajectory. The height 
of the ball when it hits the target plate is monitored and improved by learning. 

A video camera records where the ball hits the target plate. The impact of the 
ball on the target plate is sensed by a force sensor. This signal is used to choose 
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the video frames which are stored for later analysis. After the throw, the location 
of the ball on the target plate is manually measured from the appropriate video 
frame. This location measures the robot system's performance of the task. 

Much like our conception of the human thrower, the robot system aims at 
the target. Based on the measured distance between robot and target, an inverse 
ballistics model calculates the desired release velocity of the ball. A simple ballistic 
model, including only gravity, is used to represent the flight of the ball 

Vd = Vbou - am(0bau) -t- 1/2- g-t 2 (2.1) 

*d = Vbaii • cos(O ba „) ■ t (2.2) 

For a given position of the target (aj^yj) and release angle of the ball (Obaii), 
the necessary release velocity is calculated by eliminating the variable t from 
Equations (2.1) and (2.2). The release angle of the ball (#(,<,// = 45°) remains fixed 
throughout the experiment as part of the task strategy. 

An inverse kinematics model relates the desired release angle and release veloc- 
ity of the ball to the joint angle and joint velocity of the arm. The model relates 
the release angle of the ball (#6 a /j) to the angle of the arm (0 arm ) by an offset 

arm = Obaii + 90° (2.3) 

The offset is set to 90° since the ball leaves in a direction perpendicular to the 
arm. Figure 2.1 shows that the ball will be released at 45° (9baii = 45°) when 
the arm is 135° (0 re i = 135°) from the horizontal. The kinematics model also 
calculates the angular velocity of the arm based on the desired release velocity 
and the length of the arm. 

An inverse trajectory model computes the sequence of joint angles necessary 
to swing the arm. The joint is servoed to a fifth-order polynomial trajectory that 
moves the arm through a 180° arc, from 225° to 45°. The arm is accelerated 
from rest until it reaches 135°, and then decelerated to rest at 45°. The desired 
release angle and desired release velocity are assumed to be midway through the 
trajectory. 

An inverse dynamics model and a feedback controller are used to accurately 
drive the robot arm along the desired trajectory. Feedforward torques are calcu- 
lated using the acceleration profile of the trajectory and the estimated inertia of 
the third link. A position-velocity feedback controller insures that the arm closely 
follows the desired trajectory on each throw. The control law that includes both 
feedforward and feedback torques is 

T = T ffwd - K p • (6 - 9 d ) -K v .($- d d ) (2.4) 

where K p and K v are the position and velocity feedback gains. 

We can formalize the sequence of component models that describe the ball- 
throwing system. The desired task performance, which we term the aim, can be 
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mathematically related to the desired torque commands that are computed by the 
models 

torque commands = ET^f - 1 (K- 1 (6 _1 (aim)))) (2.5) 

where D -1 represents the inverse dynamics model, T _1 the inverse trajectory 
model, K -1 the inverse kinematics model, and B -1 the inverse ballistics model. 
We use a caret (*) to denote a model. As described above, the ballistics model 
computes the necessary release velocity of the ball, the kinematics model then 
calculates the angular velocity of the arm, the trajectory model computes the 
necessary time-sequence of joint angles, and the dynamics model calculates the 
torque commands that drive the joint actuator. 

2.3 The Problem and a Solution 

In this section, we explain why any performance improvement is necessary for 
the ball- throwing system. We begin by describing the errors that occur when 
the robot system throws a ball — how far from the target the ball actually lands. 
We explain that the errors in throwing are due to modeling errors. Finally, we 
present the rationale for using task-level learning to improve the performance of 
the ball- throwing robot system. 

2.3.1 What is the Problem? 

When the robot throws the ball using these models, the ball misses the target. 
Based on the measured distance to the target, 5.75 m, the robot system uses the 
component models described by Equation (2.5) to calculate the required torque 
commands. The ball is thrown when the robot is commanded with these joint 
commands. The ball lands 0.28 m above the target in the target plane. We term 
the 0.28 m error a performance error. Based on the models described above, the 
robot is unable to successfully throw the ball at the target. 

The throw misses the target because our models are inaccurate descriptions 
of the real system. The ballistic, kinematic, trajectory, and dynamics models 
only approximate the ball-throwing system. As a result, when the ball is ini- 
tially thrown, the actual performance differs from the desired performance. Many 
factors can cause this performance error. These factors include an inaccurate 
measurement of target location, inaccurate kinematic model, inaccurate vision 
system, inaccurate feedfoward torques, air resistance, torque saturation, sensor 
miscalibration, and sensor noise, as well as errors in release angle and release ve- 
locity. With all these different factors, it is difficult to improve the models so that 
the ball-throwing system can reliably hit the target. 
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Figure 2.2: Ball-Throwing Task Model 

2.3.2 What are the Solutions? 

Given a system with performance errors, we ask the question: what are the so- 
lutions? We suggest that two different approaches can improve the performance 
of the ball-throwing system: (1) the models of the ball-throwing system can be 
made more accurate by calibration, or (2) learning algorithms can be applied at 
the task-level. 

Our goal in this research is to explore the second option. Most researchers in 
robotics address the first approach, and we refer the reader to Chapter 4 for a 
discussion of robot calibration. We choose to explore an approach that we call 
task-level learning. Instead of calibrating the model of the system, we modify the 
robot system's task- level command. The task-level command is the target location 
to which the robot system tries to throw the ball. We begin by formalizing the 
notion of a task model in the next section, and then develop two task-level learning 
algorithms. 

2.4 The Task Model 



In this section, we formalize the concept of a task model. A task model relates 
the desired performance of a system to the actual performance. In the case of 
ball-throwing, the task model relates where the system is trying to throw the 
ball — the aim — to where the ball actually lands — the hit. In the ideal case, when 
we have a perfect model of the robot system, a task model can be represented by 
the identity transform, suggesting that the ball will hit wherever the system aims. 
In general, the task model only approximates the identity transformation. 

The task model is composed of two transformations: the inverse model of the 
system and the system transformation. A block diagram of the task model for 
the ball-throwing system is shown in Figure 2.2. For ball- throwing, the inverse 
model of the system is composed of the four component models derived in Sec- 
tion 2.2. This sequence of models transforms an aim — where to hit the ball — to a 
sequence of actuator commands. The inverse model transformation was described 
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by Equation 2.5, which we rewrite here 

commands = D- 1 (T _1 (K- 1 (B- 1 (aim)))) (2.6) 

This equation, which includes each component model of the ball-throwing system, 
can be collapsed into 

commands = S _1 (aim) (2-7) 

where S -1 () represents the inverse model of the system. 

The second part of the task model is the system transformation that describes 
the ball-throwing robot. This transformation is determined experimentally by 
commanding the robot actuators and measuring the landing point of the ball. 
The system transformation relates actuator commands to the landing position of 
the ball, which we call the hit. Formally 

hit = S(commands) (2.8) 

where S() denotes the ball-throwing system transformation. 

Together, the inverse model and the system transformation describe the task. 
By simply combining Equations (2.7) and (2.8), we obtain a task model 

hit = S(S- 1 (aim)) (2.9) 

In the ideal case, when the model perfectly describes the system, S(S -1 ()) reduces 
to the identity transformation. As discussed above, the ball-throwing models 
only approximate the robot system, and so the ball-throwing task model only 
approximates the identity transformation. 

2.5 Task-Level Learning 

In this section, we develop a task-level learning approach that improves the perfor- 
mance of the ball-throwing robot. The basis of task-level learning is to modify the 
system's task-level command based on errors in task performance. The learning 
procedure formalizes our interpretation of the human ball thrower who modifies 
his aim based on how far the ball landed from the target. 

We develop two general learning procedures — fixed-model learning and refined- 
model learning — that improve system performance with practice. Both refine the 
system's aim based on errors in task performance. Fixed-model learning uses the 
task model to transform this new aim into new robot commands. Refined-model 
learning is an interpolation procedure that refines the task model while computing 
the robot commands. 
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2.5.1 Fixed-Model Learning 

In fixed-model learning, the correct aim is estimated based on the performance 
errors. In ball throwing, the aim is updated after each throw by the amount 
the ball missed the target. This measured error — whether positive or negative — 
updates the aim as a running sum 

aim n+1 = aim n — (hit n — target) (2.10) 

This new aim is transformed through the inverse ball-throwing model to calculate 
the refined robot commands 

command n+ i= S (aim n +i) (2.1 1) 

A physical interpretation is helpful for understanding Equations (2.10) and (2.11). 
In the case where the ball falls short, the performance error is negative, raising the 
aim by that amount. This action corresponds to our intuition that we should aim 
higher if we are hitting too low. Together, Equations (2.10) and (2.11) provide 
the basis for fixed-model learning. 

Fixed-model learning was applied to the ball throwing task. The target was 
placed at a horizontal distance of 5.75 m and a height of -0.9 m from the robot. Us- 
ing the ballistic, kinematic, trajectory, and dynamics models of the robot system, 
a set of joint commands was calculated using Equation (2.11). (Aimo and hit are 
each defined to be the target.) The robot threw the ball with this set of torque 
commands, resulting in a hit of 6.03 m in the target plane. Based on the per- 
formance error of 0.28 m, the aim was modified to 5.47 m using Equation (2.10). 
Once again, the models of the robot system described in Equation (2.11) were 
used to calculate a new sequence of joint commands, and the ball was thrown 
again. The ball hit at 5.97 m, a new aim was calculated to be 5.25 m, and the 
model of ball throwing was applied again to calculate joint commands. This it- 
erative learning procedure continued until the robot successfully completed the 
task. The aim for the successful throw was 4.78 m, almost one meter closer than 
the target. The open boxes connected by a dashed line in Figure 2.3 show how 
errors in performance were reduced with practice. The open boxes in Figure 2.4 
show how the sequence of aims converged to 4.78 m in the target plane during the 
learning process. By applying fixed-model learning, the robot system successfully 
performed the task on the eighth iteration. 

It is important to generalize the fixed-model learning procedure so that it can 
be applied to other tasks. The system task command is the aim, labeled with 
the vector aim, and the task performance is the hit, labeled with the vector hit. 
Generalizing the error correction equation to a multi-dimensional task, we write 

aim n+x = aim n — (hit n — target) (2.12) 
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where target is the desired system performance. We also extend the task model 
equation to 

hit n+1 = SfS-^aimn+i)) (2.13) 

Fixed-model learning has successfully been applied to a multi-dimensional task in 
the case of trajectory following [Atkeson and Mclntyre 1986]. A similar approach 
has also been developed for the task of kinematic positioning at a visual target 
[Atkeson et al. 1987]. 

The convergence of fixed-model learning depends on how accurately the model 
describes the behavior of the system. The convergence criteria can be derived by 
using fixed point theory [Wang 1984; Wang and Horowitz 1985]. A learning 
algorithm can be viewed as a mapping of aims on the nth attempt to aims on the 
next attempt 

aim n+1 = F(aim n ) (2.14) 

The fixed-model learning algorithm can be put into this form by substituting 
Equation (2.13) into Equation (2.12). Fixed-model learning modifies the nth aim 
by adding an amount based on the aim transformed by the task model 

A -a 

aim n+1 = aim n — S(S (aim n )) + target (2.15) 

Note that when the correct hit, hit*, is achieved by using the correct aim, aim*, 
then hit* = S(S _1 (aim*)). In this case, target = S(S _1 (aim*)), and Equa- 
tion (2.15) reduces to the fixed point aim n+1 = aim n = aim*. 

We can ask whether this fixed point is stable by analyzing a linearization of 
Equation (2.15) at the point (aim, hit) = (aim*, target). We begin by writing 
an equation for small perturbations around the fixed point. For a perturbation 
<5aim from the fixed point, 

S(S" 1 (aim* + tfaim)) = S(S" 1 (aim*)) + JJ'^aim (2.16) 

where J is the Jacobian matrix for the system transformation S(), J is the Ja- 
cobian matrix for the model S(), and J -1 is the Jacobian matrix for the inverse 
model S _1 (). J, S(), J -1 , and S -1 () are all evaluated at the fixed point. To 
analyze the fixed point for stability, we consider the case in which the nth aim 
is perturbed from aim* by £aim n so that aim n = aim* + 6aim n . The change 
in the aim, <$aim n+1 = aim n+1 — aim*, can be computed by substituting Equa- 
tions (2.16) into Equation (2.15), and obtaining 

£aim n+1 = (I - JJ-^aimn (2.17) 

The matrix (I — JJ -1 ) provides a necessary condition for convergence of fixed- 
model learning. When the task model is a linear function of the aim, the matrix 
(I— J J -1 ) provides global convergence criteria. The error in the aim, Saim, 
will decrease when all the eigenvalues of the matrix (I — JJ -1 ) are less than one 
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Figure 2.5: Aim/ Hit Behavior of the Task Model 



in absolute value, with the rate of decrease determined by the magnitude of the 
eigenvalues. If the magnitudes of all the eigenvalues are less than one, the learning 
process is stable and performance improves with practice. The magnitude of the 
eigenvalues of (I — JJ -1 ) depends on how accurately J -1 inverts J, and thus the 
stability of the learning algorithm depends on how closely the model inverts the 
controlled system. Thus, better modeling improves the stability and the speed of 
the learning process. 

In the general case where the task model is a non-linear function of the aim, 
it is difficult to develop global convergence criteria. The criteria developed for 
the linear case can be applied locally, however, as a necessary but not sufficient 
condition for convergence. Thus, all the eigenvalues of the matrix (I — J J -1 ) 
must be less than one in absolute value for learning to converge. If the magnitude 
of any eigenvalue is greater than one, fixed-model learning will almost certainly 
degrade performance. The better the model approximates the system, the closer 
the magnitudes will be to zero, and the more likely learning is to converge. 

We applied this local convergence criteria to the ball throwing system. A plot 
of the aim/hit behavior of the task model is shown in Figure 2.5. The aims and hits 
are measured along the target plane. The data for the plot was experimentally 
determined by commanding the robot system with a number of different aims, 
and recording the landing position of the ball. Note that in the ideal case, when 
the inverse model perfectly describes the system, the task model reduces to the 
identity transform, producing the dotted line on the plot. The task model was fit 
by a linear function: the quantity (JJ -1 ) was estimated to be 0.31. The eigenvalue 
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of (I — J J -1 ) for this task model was then calculated to be 0.69. This value is less 
than one, indicating that the ball throws are likely to converge to the target. The 
open boxes in Figure 2.3 demonstrate that the ball throws did in fact converge. 

The performance of the learning procedure refers to the rate of convergence. 
The geometric rate of convergence is given by the magnitudes of the eigenvalues of 
(I — JJ _1 ). The best performance is achieved when all the eigenvalues are close 
to zero. In the ball throwing task, for example, the geometric rate of convergence 
was calculated to be 0.69. The stars in Figure 2.6 illustrate this theoretical rate of 
convergence. The actual iterations of fixed- model learning, denoted by open boxes, 
closely approximate this prediction. If the model is made more accurate, the 
eigenvalues of the matrix (I — JJ" 1 ) approach zero leading to faster convergence, 
improved learning, and better noise rejection. 



2.5.2 Refined-Model Learning 

Refined-model learning refines the task model as well as the aim during practice. 
The refined-model approach constructs a local linear model of the system from 
the last m + 1 attempts at the task, given a system of m inputs and m outputs. 
Thus, this method is an alternative to fixed- model learning only after the (rn + 
l)th iteration. Once in use, refined-model learning sacrifices the original model 
structure for a simpler local model. This local model is updated after each attempt 
at the task, leading to a refined system command. 
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In ball throwing, refined-model learning was applied after two attempts at 
the task. In order to easily implement refined model learning, a scalar quantity 
was necessary to characterize the robot command. We chose trajectory duration 
because the time-length of the throwing motion directly affected the release ve- 
locity, which in turn affected the distance the ball was thrown. As in our previous 
experiments, the first throw with a trajectory duration of 138 ms resulted in a 
performance error of 0.28 m. The second throw with a 143 ms duration resulted 
in a performance error of 0.22 m. Refined-model learning linearly extrapolated 
between these two points, suggesting a throw with a trajectory duration of 160 ms. 
Given the performance error, en 

e n = hit n - target (2.18) 

the iteration rule for refined-model learning is 

en 

command n+ i = command n — -. r-77 ; rr (2.19) 

(e n _x — en)/(command n _i — command n ) 

The results of refined-model learning for the ball throwing task are given by the 
diamonds in Figure 2.3. The desired performance was reached in just five itera- 
tions. 

Refined-model learning can be generalized to multi-dimensional tasks. We 
first define the performance error, e n , to be the difference between the hit and the 
target on the nth iteration 

e n = hit u - target (2.20) 

We next define AC and AE to be m x m matrices 



AC = [ C - C n C! - c n • 


• • C n _i - c n ] 


(2.21) 


AE = [ e - e n e x - e n • 


• • e n _ x - e n ] 


(2.22) 



where c denotes the command. The general refined-model learning equation is 
then written as 

c n+1 =c n -AC(AE)- 1 e n (2.23) 

Refined-model learning in this form is similar to the secant method of finding 
zeros of functions [Gragg and Stewart 1976]. Improvements for avoiding singu- 
larities and for hastening convergence are described in the numerical methods 
literature and can be readily applied. Our primary interest here is to propose 
general learning procedures which can later be refined if they appear promising. 
Refined-model learning will converge only if the first attempts at the task are 
sufficiently near the desired performance and if the system function is sufficiently 
smooth in that neighborhood. Because of these two restrictions, a principled and 
conservative approach to extrapolation should be taken. In the case of a one- 
dimensional system, for example, it might be useful to set a limit as to how far to 
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extrapolate. It might also be wise to interpolate instead of extrapolating as soon 
as a point on both sides of the desired performance is found [Press et al. 1986]. 

The performance of refined-model learning depends primarily on the com- 
mand/performance behavior of the system. The performance on the first learning 
iteration also depends on the accuracy of the internal model. A better model 
makes the original performance error smaller, making learning faster. In the case 
of a one-dimensional system, the performance error, e n , will decay superlinearly. 
This convergence rate is faster than for fixed-model learning, which only converges 
geometrically [Forsythe, Malcolm, and Moler 1977]. Figure 2.3 demonstrates this 
performance advantage in the throwing task. 

2.6 Discussion 

We have demonstrated that task-level learning improves a robot's performance 
of a fairly simple task — ball throwing. After each throw, task-level commands 
are adjusted to reduce the system's performance errors. Overall, we note four 
principal contributions. 

The most important contribution is to demonstrate that task-level learning 
can take place by varying the aim of the ball-throwing system. The aim is varied 
by an amount equal to the performance error of the system, much as the person 
we observed varies his aim based on where the ball landed during the previous 
throw. The ball-throwing task model is used to interpret this new aim, and to 
calculate new commands to drive the robot system. We have demonstrated both 
experimentally and theoretically that task-level learning improves the performance 
of the ball-throwing system. 

A second contribution is to demonstrate that learning can take place at the 
task-level without extensively calibrating the component models that describe the 
robot system. Learning improved the task-level performance of the ball-throwing 
system even though the ballistic, kinematic, trajectory, and dynamic models in- 
accurately described the system. For example, the desired trajectory of the robot 
arm was never followed perfectly, but the task was accomplished nonetheless. Fig- 
ure 2.7 shows the desired and actual velocity trajectory of the final throw. That 
throw resulted in a perfect hit (zero error) even though the desired and actual 
trajectories differed. The difference is an indication that learning can proceed at 
the task level, even though lower level modules do not perform perfectly. 

A third contribution is that the task model can be made more accurate while 
the system is learning a task. Refined-model learning takes this approach, as it 
builds a local linear model of the actual system to produce the next task com- 
mand. This simple linear model replaces the task model and improves the speed 
of task- level learning on the ball-throwing task. Figure 2.3 shows that refined- 
model learning converged after five learning iterations, while fixed-model learning 
converged after eight iterations. Another potential approach is to build a local 
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model on top of the task model as the robot practices the task. Such a combined 
task model could more accurately describe the subtle characteristics of the system, 
and increase the stability and the speed of task-level learning even further. 

A fourth contribution is to formalize an intuitive notion of a task model. The 
task model is a transformation that relates the desired performance to the actual 
performance of the task. The task model is composed of two parts: the system 
transformation and the inverse model of the system. In the ideal case, when 
the inverse model perfectly describes the system, the task model reduces to the 
identity transform and the system achieves the desired performance. 



Chapter 3 

Learning the Juggling Task 



In this chapter, we discuss how to build a robot system that juggles more consis- 
tently with practice. We explain that the juggling task provides a rich domain in 
which to test our task-level learning algorithms. We describe the characteristics 
of the juggling task and a robot system that achieves this task. We explain how to 
apply the learning algorithms described in Chapter 2 to improve the performance 
of a juggling robot. Finally, we provide experimental results of a robot system 
that improves its juggling performance with practice. 

3.1 Introduction 

In the juggling task, the robot bounces a platform tennis ball on a paddle. The 
task is to repeatedly hit the ball with the paddle, bouncing it up into the air much 
as a person can do with a tennis racquet and ball. The robot system monitors 
the trajectory of the ball in the air. Based on an estimated trajectory, the robot 
calculates how to hit the ball upwards and back to the center of the paddle. The 
robot should be able to perform the task indefinitely. 

The first major reason for choosing the juggling task is its multi-dimensional 
nature. The task goal is three dimensional: (1) to hit the ball to a specified height, 
(2) for the ball to land at a position x on the paddle, and (3) for the ball to land 
at a position y on the paddle. This multi-dimensional goal is in contrast to the 
one-dimensional goal in the robot throwing experiments. Juggling gives us the 
opportunity to test a multi-dimensional form of the task-level learning algorithms 
developed at the end of Chapter 2. 

A second reason to study the juggling task is the complexity of the model 
that describes the robot juggling system. The juggling model is made up of many 
component models, including vision, forward ballistics, inverse ballistics, resti- 
tution, kinematics, trajectory following, and dynamics. Ball throwing required 
three models, but juggling requires as many as eight component models. The 
sheer number of models provides an excellent test for the model-based, task-level 
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learning algorithms that we have developed. 

A third reason for working on the juggling task is to raise the issue of gener- 
alization. The learning algorithm developed in the throwing task applies directly 
to the problem of learning a single aim or hitting a ball from a single location. 
However, in the juggling task the ball falls to many different locations on the 
paddle. We are thus forced to consider two learning schemes. When arbitrar- 
ily generalizing, corrections in the aim that are learned when a ball falls to one 
paddle location are used when the ball falls to all other paddle locations. When 
selectively generalizing, the learning can be indexed according to a state-space, so 
that when the ball falls to similar locations the learning is generalized and when 
the ball falls to different locations no generalization occurs. 

A fourth reason to focus on the juggling task is to examine the issue of training. 
In order to improve the system's juggling performance, the robot is trained in a 
sequence of three subtasks. The robot juggling system learns to perform the 
first hit, the second hit, and then the successive hits. Only after learning these 
subtasks, can the robot perform the juggling task. The need for this sequence 
of subtasks suggests that the training process is important in learning a complex 
task. 

3.2 The Juggling Task 

In this section, we describe the characteristics of the robot juggling system. The 
component vision and robot systems are described, as well as the models that are 
used to plan each hit in a juggling sequence. From these models, we explain how 
to build a task model of juggling. Model calibration procedures are outlined to 
underscore the need for accurate models. The repeatability of the robot juggling 
system is also analyzed in order to provide a more accurate indication of the 
system's capabilities. 

The task is to bounce a ball at the end of the robot paddle. The task begins 
when a ball is dropped from the ceiling and falls towards the robot paddle. The 
task involves monitoring the flight of the ball, and estimating its downward tra- 
jectory. Based on the estimated landing location of the ball, a desired upward 
trajectory is calculated that returns the ball to the center of the paddle after the 
hit. Using these upward and downward trajectories, the robot system computes 
the necessary velocity at which to hit the ball. The robot must then calculate 
a paddle trajectory that will hit the ball with the correct velocity and the cor- 
rect angle at the correct time. After the robot moves the paddle through this 
desired trajectory and hits the ball upwards, the sequence of monitoring the ball, 
calculating a response, and hitting the ball is repeated. 

The task on the first hit is simplified by using an estimate of the trajectory 
of the ball gleaned from previous experimental data. This estimate is accurate 
because the ball is always dropped by a solenoid from the same location above 
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Figure 3.1: Juggling Robot/ Vision System 

the paddle. With this estimate, the vision system does not need to monitor 
the downward flight of the ball for the first hit of a juggling sequence. This 
simplification is necessary because there is not enough data to make an accurate 
estimate of the ball's trajectory before the robot is required to move. The vision 
system needs to track the ball moving both upwards and downwards to make an 
accurate estimate of the trajectory. Since the downward trajectories of the first 
ball is repeatable, using one prototype of the first downward trajectory allows 
us to apply the model-based learning algorithms with full generality on the first 
hit. All successive hits can be performed by monitoring the ball in real-time, 
computing a response, and executing a hit trajectory. 



3.2.1 System Description 

A paddle is attached to the Direct Drive Arm to hit the ball (Figure 3.1). The 
paddle is 0.46 m on a side, and its center is mounted 0.175 m from the joint two 
axis and 0.464 m from the joint three axis. The paddle is made of 0.02 m premium 
plywood which is mounted on five 0.025 m wide hollow aluminum studs that are 
laid across a 0.10 m wide hollow aluminum beam. 

The task begins when a platform tennis ball is released by a solenoid from 
a distance of 1.5 m above the paddle. A vision system monitors the trajectory 
of the ball at 30 Hertz. Two video cameras (Sanyo VDC-3860) equipped with a 
half millisecond electronic shutter are mounted 5.0 m from the robot (Figure 3.1). 
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One camera is positioned directly in front of the robot paddle. A second camera is 
mounted to the right of the robot paddle, perpendicular to the first camera. Each 
camera has a field of view of 0.7 by 1.0 m, centered about the point (0.0,0.0, 1.0) 
in paddle coordinates. To simplify the vision problem, the platform tennis balls 
appears white on a background of black. The robot is also never in the field of 
view of the cameras. 

The analog signal from each camera is processed by a Datacube vision system 
with the aid of a Sun host computer. The RS-170 signal from each camera is 
digitized independently using Datacube hardware. A threshold is applied to the 
digitized output and the x, y pixel locations of each bright pixel is written into 
memory. The Sun computer is used to average the x, y locations of the pixel 
values. This information provides the x, y centroid of the tennis ball in each 
camera frame. With the Datacube system operating at frame rate, camera-space 
centroids from both cameras can be extracted every 33 ms. 

Camera-space centroid information (xi,yi,X2,y2) is sent over serial line to 
a 68020 microprocessor, running under the Condor real-time operating system 
[Narasimhan and Siegal 1987]. This microprocessor immediately associates a time 
to the centroid data to produce a full data point of the form (xi, j/i, #2, J/2, t). The 
processor keeps track of the position of the ball using a simple finite-state machine. 
When a ball falls below the field of view of the cameras, the processor resets its 
centroid buffer. Once the ball reappears in the field of view, centroids are saved 
starting at the beginning of the centroid buffer. The processor also communicates 
directly with three other microprocessors that run high-level juggling code and 
control the robot arm. 

Camera-space centroids are transformed to paddle coordinates by means of the 
vision model. The transformation assumes a simple orthographic projection model 
of image formation. Since the cameras are mounted perpendicular to one another, 
the x, y, z, t centroid of a tennis ball in paddle coordinates is easily inferred from 
a full data point, (#i, j/i, x 2 , t/2> t)- A simple camera system calibration procedure 
determines the parameters that describe vision transformation. The calibration 
is discussed in Section 3.2.3. 

Based on the centroids of the ball in flight generated by the real-time vision 
system, a forward ballistics model estimates the landing point of the ball on the 
paddle. The model performs a parabolic least-squares fit of the height vs. time (z 
vs. t) centroid data. With the resulting parabolic trajectory, the model determines 
the landing time of the ball. The ball is assumed to land when it crosses plane 
z = 0.0, the height of the paddle. (This assumption is not completely accurate, 
and will be handled in the discussion of the angle-time offset model.) The forward 
ballistics model also does a linear least-squares fit of the x vs. t and the y vs. t 
centroid data. Based on the landing time of the ball, the x and y position on 
impact is estimated. From these three fits, the velocity of the ball at impact is 
also computed. The least-squares fits in the forward ballistics model are always 
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made with at least six centroids to improve the reliability of the fit. A new fit 
is made each time another centroid is acquired. For a ball that travels to about 
1.0 m in height, between 11 and 12 centroids are generally available before the 
robot arm must begin to move. The last centroid is acquired when the ball is at 
a height of 0.70 m and falling towards the paddle. 

Based on the task goal and the estimated landing position of the ball, an 
inverse ballistics model computes the desired outgoing trajectory that the ball 
should follow. This trajectory assumes a simple ballistic model to represent the 
flight of the ball 

xj = Xi + &i ■ t (3.1) 

vs = yi + ifi-t (3.2) 

z f = zi + ii-t- 1/2- g-t 2 (3.3) 

The outgoing trajectory is fully determined by specifying the current landing 
position of the ball and the task goal. The estimate of the current landing position 
is given by the forward ballistics model. The task goal is a three-dimensional 
vector that describes the height, z, that the ball should reach, and the x, y landing 
point on the next bounce. At the start of a juggling sequence, the task aim is to 
hit the ball to a height of 1.0 m and to have it land at the center of the paddle, 
x,y = (0.0,0.0). This three-dimensional task aim vector is varied during the 
learning experiments to improve the performance of the juggling robot system. 

Once estimates of the incoming and desired outgoing trajectories of the ball 
are made using the forward and inverse ballistics models, a restitution model 
predicts the angle and velocity with which to hit the ball. The model assumes 
perfect angular restitution of the ball, predicting that the angle of incidence and 
reflectance of the ball with respect to the paddle are equal. The incident and 
reflectant angles are decomposed into rotations about the x and y axes of the 
paddle. The angles at which to hit the ball is computed by averaging the incident 
and reflectant component angles 

*. = (*i + &)/2 (3.4) 

<t>v = (4>i + <t> r y)/2 (3.5) 

where the angles are measured to the vertical. The restitution model also predicts 
the velocity with which to hit the ball. A simplified model [Beer and Johnston 
1977] assumes that the relative velocity of ball and paddle before and after the 
hit is proportional 

Saddle ~ V ball = e ' («6aH " ^paddle) (3-6) 

The proportionality factor, e, is termed the coefficient of restitution. The coef- 
ficient of restitution is determined experimentally by dropping the ball onto the 
stationary paddle, and measuring the height of the bounce relative to the initial 
height. 
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Figure 3.2: Juggling Robot 



A kinematics model relates the desired angle and velocity of the paddle to the 
joint angles and velocities of the robot arm. Joints two and three of the MIT 
Serial Link Direct Drive Arm [An, Atkeson, and Hollerbach 1988] are used to hit 
the ball (Figure 3.2). Joint one is held with a brake in a fixed position during 
the juggling experiments. Joint two provides rotation about the y axis, allowing 
the robot to hit the ball in the x direction. Joint three provides rotation about 
an axis parallel to the x axis and serves two functions. The angular position of 
joint three is used to hit the ball in the y direction, returning the ball towards 
the center. The velocity of joint three is used to impart an upward velocity to 
the ball. The kinematics model relates the hit angles to robot joint angles by 
using a simple offset. This offset is determined experimentally as described in the 
calibration subsection. The kinematics model calculates the angular velocity of 
joint three based on the desired hit velocity and an estimate of the hit location. 

An angle-time offset model is necessary to correct the discrepancy between the 
forward ballistics and kinematics models. The forward ballistics model calculates 
the time at which the ball crosses the horizontal plane z = 0.0, the height of the 
level paddle. The kinematics model computes the angle away from the horizontal 
at which to hit the ball. An inconsistency occurs because the timing is estimated 
independently of the robot kinematics and because the kinematics of the manip- 
ulator couples movement in the third joint with a change in paddle position. The 
angle-time offset model takes this kinematic- timing behavior into account. Based 
on the angle at which the ball will be hit, the model estimates the height at which 
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Figure 3.3: A Typical Juggling Trajectory (Joint Three) 



contact with the ball will be made. The model uses the forward ballistics data to 
estimate the velocity of the ball in this region. A time offset is calculated that 
represents the travel time of the ball between the actual hit position and the level 
paddle position. This offset is added to the estimated time at which to hit the 
ball. 

A trajectory model computes the joint trajectories that bring the paddle to 
the desired angle and velocity for each hit. The model includes three phases — 
speed-up, constant velocity, and slow-down — for the trajectories of joints two and 
three (Figure 3.3). In the speed-up trajectory, which lasts 200 ms, both joints 
start at the level position. Joint two executes a fifth-order polynomial start- 
ing from 9 l 2 evel to 0% u . Joint three follows a fifth-order polynomial from l 3 evel to 
(0§** - 0.025 • 0£"), ending with a velocity of eg*. In the constant velocity phase, 
which lasts 50 ms, joint two is servoed to the hit angle, Olf*. Joint three moves at 
constant velocity 0^, crossing the angle #3" precisely mid-way through the tra- 
jectory. In the slow-down phase, which lasts 200 ms, joint two executes another 
fifth-order polynomial trajectory starting from #£'* back to 0lf vel . Joint three ex- 
ecutes a fifth-order polynomial trajectory starting from (0^* + 0.025 • #3"') with 
initial velocity 0$'*, and ending at the paddle level position, 0£ vel . We chose this 
particular sequence of trajectories as a way to increase the probability that the 
paddle would hit the ball with the desired angle and velocity. 

A dynamics model and feedback controller are used to accurately drive the 
robot joints along the desired trajectory. The dynamics model compensates for 
the effects of gravity on joints two and three, as well as for the inertia of joint 
three. The inertia of joint two is not taken into account because the joint motion is 
small. The resulting feedforward torques are calculated based on the acceleration 
profiles of the hit trajectories. A position- velocity feedback controller insures that 
the arm closely follows the desired trajectory on each hit. The control law that 
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includes both feedforward and feedback torques is 

T = T ff wd - K p • (0 - d ) - K v • - d ) (3.7) 

where K p and K v are the position and velocity feedback gains. 

3.2.2 Task Model 

With the component models developed in the last subsection, we can build a task 
model for each juggling hit. The task model relates the desired performance (aim) 
of the juggling system to the actual performance (hit) 

hit = S(S- 1 (aim)) (3.8) 

where S(S -1 ()) represents the task model of the juggling system. It is important 
to note that the task model involves both the juggling system and inverse juggling 
model transformations. 

The aim and hit are specific to the robot juggling task that we described in the 
previous section. In our juggling task, the aim determines where we intend to hit 
the ball, and the hit represents where the ball actually lands. The aim is a three- 
dimensional quantity, with x, y, and z components. The x and y components 
represent where the ball should land relative to the paddle center, and the z 
component describes how high the ball should go. Similarly, the x, y, z components 
of hit describe where the ball actually landed and how high it actually went. The 
aim and hit can be expressed as column vectors 

aim = [aim*, aim v , aim*] 7 (3.9) 

hit = [hit,, hit y , hit 2 ] T (3.10) 

where the symbol T denotes transpose. 

The model and system transformations in the task model are also specific 
to our juggling system. First, we decompose the inverse juggling system model, 
S -1 (), into the component models described in the previous subsection. The 
synthesis of these component models forms the juggling model 

S-^) = D-^t-^A-VK-^R-^B-^F-^V-^)))))))) (3.11) 

where D -1 describes the dynamics model, T" 1 the trajectory model, A -1 the 
angle-time offset model, K -1 the kinematics model, R -1 the restitution model, 
B" 1 the ballistics model, F" 1 the forward ballistics model, and V -1 the vision 
model. Second, we include the system transformation for the juggling robot, S(), 
in the task model equation and obtain 

hit = Sp-^f-^A-VK-^R-^B-^F-^V-^aim))))))))) (3.12) 
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This task model describes the actions of the robot juggling system for each hit. 

In the ideal case, Equation 3.12 reduces to the identity transformation because 
the model accurately describes, or inverts, the system. Unfortunately, the juggling 
system, like many other robot systems, is difficult to model accurately. In reality, 
the model approximates the system, and Equation 3.12 only approximates the 
identity transformation. Just how accurately the task model resembles the identity 
transformation is discussed in Section 3.3.1. 

3.2.3 System Calibration 

To accurately describe the robot juggling system, we need to calibrate the pa- 
rameters of our structural models. The calibration of these models is essential for 
two reasons. First, accurate models provide information on how to command the 
system to achieve a particular result. With accurate models of the component 
systems, we can control the juggling system more precisely. Second, accurate 
models improve the performance of our task-level learning algorithms. With ac- 
curate models, learning becomes more effective in correcting errors and therefore 
faster and more stable. 

The seven component models described above were calibrated before any jug- 
gling experiments were performed. Two of these models — kinematics and vision — 
are particularly sensitive to error. These models are calibrated at the start of each 
juggling session. Other models must be calibrated just once in order to achieve 
good system performance over time. 

The first model to be calibrated is the kinematics model. We calculate the 
joint offsets that are necessary to bring the paddle to the horizontal position. 
This calibration is performed by moving the paddle to a horizontal position and 
reading the joint resolvers. 

Once the position of the level paddle is found, the vision model is calibrated. 
This transformation is obtained by calibrating the cameras directly to the paddle 
coordinate frame. In the calibration procedure, a black pole, with a tennis ball 
mounted at the top, is attached to the paddle. The distance between the center 
of the paddle and the tennis ball is accurately measured. The robot moves this 
calibration pole to ten points within the field of view of each camera. Data is 
collected that includes centroids of the ball in camera coordinates and estimates 
of the ball location based on the robot kinematics model. The robot is then 
fitted with another calibration pole of a different length. Again, the robot moves 
this pole to ten points within the field of view of each camera while centroids 
and position estimates are calculated. Two calibration poles are necessary to 
adequately sweep out a large part of the camera space. 

The vision model is defined by the following equations that transform from 
camera pixel coordinates to paddle coordinates 

%world — m x • J/camera 1 T <^x yo.i.6) 
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V world = m v • y C amera 2 + b y (3-14) 

z world — m z * x camera2 + O z (o.lo) 

A least-squares procedure is run on the 20 data points to determine the values 
of m x , m y , m z , b x , b y , and b z . Each camera is rotated 90° about its optical axis 
so that the x pixel coordinates axis is vertical. Also, the x value of camera one, 
which is redundant with the x value of camera two, is never used. 

As a fine tuning mechanism, the vision model is further calibrated with the 
use of a plumb line. A plumb line is hung from the ceiling to fall exactly above the 
paddle center. A ball is attached to the line 0.50 m above the level paddle. The 
cameras are then used to calculate the centroid of the ball using the previously 
calibrated vision model. Any differences between the calculated centroid and the 
point (0.0, 0.0, 0.5) are noted. The constant offsets in the vision model are adjusted 
by these differences to reduce any error. 

3.2.4 Modeling Errors 

In this subsection, we describe the sources of modeling error in the robot juggling 
system. We point out the inaccuracies in some of the models that describe the 
robot system. Later, we will describe how task-level learning algorithms will 
improve the performance of the juggling robot without adjusting these inaccurate 
models. Now, our goal is to understand the capabilities of the juggling system, 
as well as our ability to control it. Although the issue of modeling error could be 
analyzed in great detail, we only provide an indication of the situation based on 
our six month long experience with the system. 

Several models that describe the juggling robot are inaccurate structural and 
parametric representations of the system. The vision model makes several assump- 
tions that are not valid in practice. First, the model assumes a simple orthographic 
projection model of image formation. The model should also include the effects of 
perspective projection. Second, the vision model assumes a linear transformation 
between camera coordinates and world coordinates. In practice, we should include 
nonlinearities caused by lens and imaging array distortion. These two assump- 
tions cause position errors in the estimated centroids of up to 0.02 m based on 
the trajectory of the ball. 

The ballistic models also add errors to the juggling system. First, the models 
propagates the position errors of the vision model. The computed trajectories 
are based on inaccurate estimates of ball location, causing inaccurate estimates 
of the landing position of the ball. These errors are not uniform throughout the 
camera field of view, but instead are related to the path of the ball. Second, the 
ballistics models rely on a simple representation of flight that is based solely on 
gravity. The effects of air resistance are not accounted for. Third, the ballistics 
models propagate any timing errors in the vision model. These timing errors can 
come from transmission delays on the serial line. When a juggling hit is based 
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on inaccurate timing information, the ball will be hit inaccurately. Timing errors 
as small as 5 ms can cause 0.15 m inaccuracies in the landing position of a ball. 
In each case, the ballistics models are required to estimate the landing time and 
location of the ball 275 ms before the ball will be hit. Extrapolating forward in 
time magnifies any modeling errors. 

The restitution model is a fairly inaccurate representation of the contact be- 
tween the paddle and the tennis ball. Unfortunately, the mechanics of a paddle 
hitting a ball are difficult to model. The restitution model used in the juggling 
system assumes a smooth and frictionless paddle and ball. The ball and paddle 
obviously do not fit this description. Based on this assumption, the model as- 
sumes perfect angular restitution. Furthermore, the model assumes a perfectly 
flat and uniformly stiff paddle. Our paddle is more rigid along the y axes, and 
more flexible at the edges. The paddle is also slightly concave. Finally, the model 
does not account for the effects of a spinning ball. 

The angle-time offset model is based on an approximation of the ball's velocity. 
The model assumes that the ball travels with constant velocity near the hitting 
plane. In fact the ball is accelerating in that region. The constant velocity ap- 
proximation causes timing errors on the order of 1 ms. An iterative scheme could 
be implemented to better describe the coupling between kinematics, restitution 
angle, and forward ballistics. 

The dynamics model only partially describes the trajectory-following behavior 
of the robot arm. The model assumes no friction or damping, and no mechanical 
coupling between the joints. A feedback controller is included in the system to 
reduce the effects of these assumptions. Experiments reveal that angular error is 
less than 1° and the velocity error is less than 0.5 rad/s near the point of contact 
with the ball. The errors vary with the trajectory, but tend to be repeatable for 
a given trajectory. 

3.2.5 System Noise 

In this subsection, we discuss the issue of noise — how repeatably the juggling 
system can hit a ball to a particular location. The repeatability problems arise 
from physical realities that we do not or are not able to model accurately. 

As a measure of system repeatability, we performed several experiments. First, 
we repeatedly dropped a tennis ball from a solenoid mechanism onto a stationary 
paddle. Second, we repeatedly hit a tennis ball dropped onto a moving paddle. 
Third, we analyzed the ability of the system to hit the tennis ball on the second 
hit. 

In our first experiments, we measured the position to which a tennis ball 
bounced after hitting the stationary paddle. The robot was turned off and the 
paddle was clamped to a horizontal position. The ball was released from a solenoid 
1.5 m above the paddle. Based on our vision system, we concluded that the ball 
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repeatedly followed the same trajectory downward. We analyzed in detail the 
repeatability of the position to which the ball bounced. The standard deviation 
of the landing position in the x and y directions was a x = 0.05 m and a y = 
0.06 m. The standard deviation of the maximum height attained by the ball was 
a z = 0.01 m. Relative to repeatability of the vision and forward ballistics model, 
these standard deviations were significant. 

We performed similar experiments with the paddle moving. The robot was 
programmed to hit the ball straight upward. Fifteen trials were performed in 
which the paddle was commanded with the same trajectory. We were interested 
in the landing position of the ball on the bounce following the hit. The standard 
deviation of the landing position was a x = 0.07 m and cr y = 0.07 m. The standard 
deviation of the maximum height attained by the ball was again a z — 0.01 m. 

Finally, we analyzed the landing position of the ball after two hits. Once 
again, fifteen trials were performed. In this case, the standard deviation of the 
errors were larger than in the previous experiments. The standard deviations were 
cr x = 0.10 m, a-y = 0.09 m, and cr z = 0.03 m. These standard deviations can be 
considered significant compared to the paddle size which is 0.46 m on a side. 

Our goal in performing these experiments was to assess the repeatability of 
the contact between paddle and ball. Unfortunately, we do not understand the 
phenomenon of contact very well. Our intuition suggests that a part of the re- 
peatability problem is caused by the platform tennis balls. The balls have a single 
seam that results from the molding process. This seam, and any non-uniformities 
in the rubber compound of the ball, could cause unrepeatable bounces. We chose 
the platform tennis balls over standard tennis balls because the standard tennis 
balls exhibited even larger repeatability errors. Racquet balls or hand balls have 
a smoother surface and might provide more consistent bounces. 

These experiments demonstrate the level of noise in the juggling system. This 
noise prevents the robot from performing the juggling task without visual feed- 
back for more than three or four consecutive hits. The juggling task cannot be 
performed "open loop." As a result, the visual feedback and the eight component 
models that describe the juggling robot are indispensable for hitting the ball many 
times in a row. 



3.3 Learning the First Hit 

In this section, we describe how to apply task-level learning algorithms to im- 
prove the robot system's performance of the first hit. We analyze the theoretical 
convergence properties of the fixed-model learning algorithms as they apply to 
the juggling task. We describe experiments in which the learning algorithms are 
successfully applied to the first hit. Finally, we provide an indication of the effect 
of system noise on the learning algorithms. 
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Learning the first hit is a good starting point for applying our task-level learn- 
ing algorithms to the juggling task. First, we are using the complete set of models 
described in the previous section. Testing the task-level learning algorithms on a 
system with eight component models provides a good indication of the applica- 
bility of the learning techniques to complex tasks. Second, the first hit provides a 
straightforward way to test the success of learning a multi-dimensional task. As 
opposed to the throwing experiments that have only target height as the goal, the 
first juggling hit has three separate goals. The goal is: (1) to hit the ball to the 
height of 1.0 m, (2) for the ball to land at a position x on the paddle, and (3) for 
the ball to land at a position y on the paddle. Finally, the effect of noise on the 
task-level learning algorithms can be easily analyzed. Learning algorithms can be 
applied in the presence of noise, or while averaging out the effects of noise. With 
these experiments, we can get an indication of whether too much noise exists in 
the juggling system for learning to be successfully applied. 

Learning the first hit is also a good staring point because the first hit task 
is simpler than the entire juggling task. The first hit is simpler because the ball 
always follows the same downward trajectory. The ball is dropped from the ceiling 
by a solenoid release mechanism and falls to the same location on the paddle, with 
the same velocity, and with the same flight time. Learning the entire juggling task 
will be more difficult because of the need for generalization. After the first hit, 
the ball will invariably follow different downward trajectories and land at different 
locations on the paddle. The learning that takes place at one location must then 
be generalized to another location. Analyzing the first hit allows us to temporarily 
ignore the issue of generalization, and instead to concentrate on understanding 
task-level learning. 

3.3.1 Convergence Criteria 

Before performing any juggling experiments, we examine whether fixed- model 
learning will theoretically converge for the juggling task. The predicted conver- 
gence is given by 

£aim n+1 = (I - JJ _1 )<5aim n (3.16) 

which was derived as Equation 2.17 in Chapter 2. Fixed-model learning will 
converge when all the eigenvalues of the matrix (I — J J -1 ) are less than one in 
absolute value. The magnitudes of the eigenvalues of (I — JJ _1 ) depend on how 
accurately J" 1 inverts J, or how accurately the system is modeled. The better 
the model approximates the system, the closer the magnitudes will be to zero, 
and the more likely learning is to converge. 

The eigenvalues of (I — J J" 1 ) can be determined experimentally by performing 
some simple experiments. The experiments involve analyzing the validity of the 
task model near the desired task performance. In other words, the experiments in- 
volve analyzing whether the juggling model accurately describes the robot system. 
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Figure 3.4: Task Model: X Aim/ Hit 



To analyze the juggling task model, 

hit = S(S- 1 (aim)) 



(3.17) 



we perform experiments that vary the desired performance, aim, and monitor the 
resulting actual performance, hit. Since our goal is to hit the ball to x, y, z = 
(0.0,0.0,1.0), the aim is varied in this neighborhood. The inverse model of the 
system, S _1 (), is used to compute the necessary torque commands based on the 
prescribed aim. The commands are used to drive the juggling system, effectively 
passing robot commands through the S() transformation. After varying the task 
aim on a number of trials, we can plot the resulting hit versus aim. Since the 
juggling task is multi-dimensional, we begin by analyzing aim/hit behavior along 
the x, y, and z axes. The graphs associated with the juggling task model are 
shown in Figures 3.4 , 3.5, and 3.6. Three hits were performed at each aim to 
show the noise in the system. 

The matrix (I — J J -1 ) is determined directly from this data. We simply ana- 
lyze the change in hit as a result of the change in aim, since we can approximate 
Equation 2.13 or 3.17 by 

(5hit = (JJ- 1 )^aim (3.18) 

The diagonal terms of the matrix (JJ -1 ) can be readily gleaned from Figures 3.4, 
3.5, and 3.6 by simply measuring each slope. The off-diagonal terms are obtained 
by analyzing the change in one component of the hit with respect to a change in a 
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different component of the aim. We have analyzed data for the off-diagonal terms 

and find small, noisy correlations. For our purposes, the matrix (JJ -1 ) reduces 

to 

" 1.17 -0.19 0.12 

0.05 0.85 -0.01 (3.19) 

0.08 -0.11 1.12 

From this matrix, we can readily compute the eigenvalues of the convergence 
matrix (I — JJ -1 ) and obtain Ai )2> 3 = 0.22, —0.13,0.04. These eigenvalues are all 
less than one in absolute value, suggesting that fixed-model learning will converge 
for juggling hits in this neighborhood. Our analysis of convergence is only valid 
near the aim = (0.0,0.0, 1.0), however. To analyze convergence for a hit to the 
height of 2.0 m, we would have to do similar experiments to the ones described 
above in which the aim is varied near (0.0,0.0,2.0). The slopes of the graphs 
could again be computed, the matrix elements identified, and the eigenvalues 
calculated. Similarly, the convergence criteria we derived experimentally is only 
valid for hitting the ball from the point (0.0,0.0, 0.0) on the paddle. If the ball is 
dropped to another point on the paddle, a new set of data must be examined to 
assess convergence in that neighborhood. 

We conclude that learning will converge for this particular hit, but may or 
may not converge for hits that are significantly different. Since the ball is always 
dropped to the same position at the start of each juggling sequence, the learning 
algorithms should converge for the first hit. From the point of view of accurate 
modeling, we can conclude that our model describes the juggling system "well 
enough" for the first hit. Similarly, we can conclude that the juggling model has 
been calibrated "sufficiently" for the first hit. 

3.3.2 Learning while Suppressing Noise 

In our first experiments, we apply the fixed-model learning algorithms to the first 
juggling hit while suppressing the effect of system noise. Noise is suppressed by 
performing each hit five times and averaging the results. This averaging procedure 
reduces the effect of noise, allowing us to focus on the success of the task-level 
learning algorithms applied to this multi-dimensional task. 

The first hit was successfully performed after four learning iterations. The 
target was to hit the ball to the point (0.0, 0.0, 1.0). The learning sequence began 
with the juggling system aiming at (0.0,0.0,1.0). Five trials were performed 
with this aim, and the resulting average hit was (—0.26,0.46, 1.21). The juggling 
robot consistently hit the ball forward, to the left, and too high. Based on the 
average hit, a new aim of (0.26, —0.46,0.79) was calculated using Equation 2.12. 
Once again, five trials with this aim were performed, and the results averaged 
to (0.15,-0.23,1.12). A new aim of (0.11,-0.23,0.67) was calculated, and five 
more hits were performed. The averaged results were (—0.02,-0.07,1.00). We 
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Figure 3.8: Learning the First Hit (Noise Suppressed) 



applied fixed-model learning one more time with an aim of (0.13,-0.16,0.67), 
and the actual hits averaged out to (0.00,0.01,0.99). At this point, the task 
was considered complete since the error in each hit component was less than one 
standard deviation from the target. The measured standard deviations in the task 
components were <r x = 0.07 m, cr y = 0.07 m, and o z — 0.01 m. Figure 3.7 shows 
the sequence of aims that was necessary to achieve the task, and Figure 3.8 shows 
the corresponding sequence of component x, y, z errors. 

The rate of convergence is different than estimated from our convergence data. 
Our data from Section 3.3.1 predicts a geometric rate of convergence of 0.22, sug- 
gesting that the errors should reduced by approximately 78% on each iteration. 
The actual geometric convergence rate more closely approximates 50% on each 
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iteration. Several reasons may explain this discrepancy. First, the convergence cri- 
teria derived using fixed point theory is only valid for linear modeling errors. The 
error between the inverse model, S -1 (), and the system transformation, S(), is 
probably non-linear. A denser plot of the aim/hit behavior of the juggling task 
model would probably show the non-linear relationship. Second, the coupling 
terms of the matrix (I — J J -1 ) will tend to slow convergence. While correla- 
tions for these coupling terms in our data were small, some coupling does exist. 
Although fixed-model learning effectively decouples the command corrections by 
transforming the new aim through the inverse system model, the decoupling is 
only as accurate as the model. As a result, changes in one component of the aim 
will affect another component of the hit. 

The aim that was finally used to perform the first juggling hit successfully was 
(0.13,-0.16,0.67). The robot juggling system was aiming forward, to the left, 
and low in order to hit the ball to the desired target location. The difference 
between this aim vector and the target vector (0.0,0.0, 1.0) is an indication of the 
inaccuracy of the juggling model. However, with this modeling accuracy task- level 
learning successfully converges after four iterations. 

3.3.3 Learning in the Presence of Noise 

Additional experiments were performed to assess the success of fixed-model learn- 
ing in the presence of noise. The experiments were designed to provide only an 
indication of the effect of noise, since a full assessment requires numerous trials and 
a detailed statistical analysis that are beyond the scope of this thesis. The goal 
in performing a limited number of experiments is to determine whether learning 
is possible with the amount of noise present in the juggling system. A secondary 
goal is to discover the possible ways to reduce the effect of noise on learning. 

In contrast to the previous learning experiments of the first hit, the data was 
not averaged and no trials were repeated. Fixed-model learning was applied di- 
rectly after each trial to provide an indication of the effect of noise on convergence. 
Three learning sequences are presented in Figures 3.9, 3.10, and 3.11. The plots 
show the component errors in the task goal versus the trial number. Learning 
converged in four trials, eight trials, and five trials, respectively. In each case, 
learning took more iterations (but fewer trials) to converge than in the previ- 
ous experiments when noise was suppressed by averaging. The criteria for task 
completion was the same as in the previous experiments. 

Our criteria for task convergence no longer guarantees that the learned aim 
will consistently lead to a successful hit. Noise in the system will allow a number 
of different aims to produce the same target hit. For example, the final aims 
for the three learning sequences in the presence of noise were (0.11, -0.10,0.71), 
(0.10,-0.18,0.69), and (0.09,-0.19,0.69). These aim vectors are not equal, and 
differ from the aim vector of (0.13, -0.16, 0.67) learned during the trial averaging 
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Figure 3.9: First Hit: Learning with Noise — Sequence 1 
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Figure 3.11: First Hit: Learning with Noise — Sequence 3 



sequence. The difference in the aims is within the repeatability of the system. 
Some trial averaging or noise suppression seems necessary for the learned aim to 
converged to the desired one. 

Noise in the system promotes oscillatory behavior in the learning sequence. 
As a result, task errors tend to be reduced more slowly in these three learning 
sequences. Applying learning to a system with noise sometimes increased and 
sometimes decreased the errors in performance. For example, system noise made 
the learning both converge and diverge between the second and third hits in 
the learning sequence described by Figure 3.10. Error in the x and y components 
increased, while error in the z task component was reduced. Learning continued to 
oscillate after the third hit with the amplitude in the error decaying. The tentative 
conclusion to draw is that noise reduces the rate of convergence, increases the 
oscillatory nature of the learning algorithm, and prevents learning from converging 
beyond the noise level of the system. 



3.4 Learning the Second Hit 

In this section, we describe how the robot system learns the second hit with task- 
level learning algorithms. We begin with a discussion of why the robot needs 
to learn anything more than the first hit. We describe a training sequence that 
the robot effectively undergoes to perform the first and second hits. We present 
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experimental results of the robot juggler learning to perform the second hit. We 
finally describe the performance of the robot juggling system after learning the 
first and second hit. 

Before running the juggling system past the first hit, we have to decide what 
aim to apply to the successive hits. First, the target aim can be applied directly. 
This approach assumes that the juggling model accurately describes the robot 
system. Second, the learned aim vector for the first hit can be applied to all the 
remaining hits. In this case, learning from the first hit is generalized to the second 
and successive hits. 

With either of these approaches, the robot juggler is unsuccessful at hitting 
the ball more than several times. In the first approach, when the target aim vector 
of (0.0,0.0, 1.0) is applied after the first hit, we are assuming that the system is 
perfectly calibrated. Our data in Figures 3.4, 3.5, and 3.6 dispute this assumption, 
suggesting that the juggling model only approximates the system. Experiments 
using this approach result in the ball drifting off the paddle after several hits. 
In the second approach, when the aim vector of the first learned hit is applied, 
the robot is just as unsuccessful at hitting the ball more than several times. In 
this case, the assumption made is that learning from the first hit generalizes to 
the second hit. Unfortunately, the state, or operating point, of the task model 
differs for each hit and generalizing is not that simple. In the first hit, the ball 
is dropped from 1.5 m and reaches the paddle at approximately 1.5 m/s. For the 
second hit, the ball peaks at a height of 1.0 m, and reaches the paddle with a 
velocity of 1.0 m/s. The accuracy of the juggling model is different in each case. 
As a result, learned compensations in the aim for the 1.5 m drop are very different 
from learned compensations in the aim on the 1.0 m bounce. 

Because neither of these approaches proved successful, we applied task level 
learning to the second hit. The second hit is learned much as the first hit was 
learned. 

In choosing to improve the performance of the first hit and then the second 
hit, we are effectively "training" the robot. The juggling task is being decomposed 
into two simpler subtasks. Each juggling subtask is learned sequentially in order 
to improve performance of the entire task. In subsequent sections, a third subtask 
is learned before the robot system is capable of performing the juggling task. 

3.4.1 Learning Experiments 

Hitting the ball the second time is much like hitting the ball the first time. The ball 
should peak at approximately ? = 1.0m, and fall to the center of the paddle which 
is denoted by x,y = (0.0,0.0). The ball usually follows this trajectory, except for 
deviations caused by noise in the system. Deviations from the desired trajectory 
are monitored using visual feedback. Based on this data and the juggling model, 
the robot system computes the appropriate paddle trajectory that will hit the ball 
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Figure 3.12: Learning the Second Hit 



to the target. Once again the target is x,y,z = (0.0,0.0, 1.0). 

We applied task-level learning to improve the robot's performance of the second 
hit. The robot was programmed to perform the first hit with the task aim that 
was learned in our previous set of experiments. This task aim was initially used on 
the second hit, also. This initialization was effectively a method of generalizing 
estimates of the inaccuracies in the task model, and applying them from the 
first hit to the second. As we explained above and as the data will show, this 
generalization was by no means a perfect one. 

Task-level learning was initially applied in the presence of noise. However, 
when the robot was within one standard deviation of the hit, three trials were 
performed and averaged before making another learning iteration. With this 
method, we hoped to rapidly converge to the area around the desired aim. Once 
near the desired aim, three trials performed at each learned aim. This averaging 
procedure suppressed the effect of noise on the learning algorithm and made us 
more certain that the correct aim had been found. 

Task-level learning successfully converged after the ninth iteration. A graph of 
the performance errors at each learning iteration is shown in Figure 3.12. Within 
seven iterations in the presence of noise, learning had converged to within one stan- 
dard deviation (<r x = .07, <r y = .07, cr z = .01). Three trials were averaged at each 
successive learning iteration. With this noise suppression approach, learning again 
converged to within one standard deviation of the error on the ninth iteration. 
The task aim that finally resulted in the successful hit was (0.06, -0.12,0.84). 
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3.4.2 Juggling Performance After Learning Hits One and 
Two 

In this subsection, we ask the question: How well can the system juggle after 
learning the first and second hits? The aim necessary to generate a successful 
first hit was found in Section 3.3. The aim necessary to successfully perform 
the second hit was determined in Section 3.4. In the following experiments, the 
learned aim from the second hit is generalized to all successive hits, and the system 
was commanded to juggle until the ball strayed from the paddle. 

We generalize the learned aim for the second hit to all successive hits. The 
aim learned for the second hit of (0.06,-0.12,0.84) is used as the aim for all 
successive hits. We command the successive hits with this aim, reasoning that 
the second hit is similar to the successive ones. This assumption is valid if the 
ball always returns to the center of the paddle. If the ball lands far from the 
center of the paddle, the hits are sufficiently different that the generalization may 
not be accurate. Nonetheless, this generalization is better than commanding the 
target aim (0.0,0.0, 1.0) which was shown to produce less than three or four hits 
in Section 3.4. 

We performed juggling experiments using a learned aim for the first hit, a 
learned aim for the second hit, and the learned aim for the second hit generalized 
to successive hits. The robot juggling system performed 20 juggling sequences in 
this configuration. The robot averaged eight hits, with a low of three hits and a 
high of 23 hits. The performance of the robot could be characterized as erratic. 

A visual analysis of the robot failures indicates that the ball oscillates back and 
forth in the y direction on successive bounces, eventually landing out of reach. The 
robot tends to overcompensate along this direction. When the ball is hit forward 
of the center, the robot overreacts by hitting the ball too far back. If the ball is 
hit behind the center, the robot overcompensates, hitting the ball too far forward. 
These oscillations eventually cause the ball to stray from the paddle area. 

The tentative conclusion we reach is that the command learned for the second 
hit cannot be arbitrarily generalized to all other hits. In other words, the learned 
aim for the second hit of (0.06,-0.12,0.84) should not be used as the aim for 
all successive hits. In particular, the aim learned for a ball falling to the paddle 
center differs from the command necessary to compensate for a ball that lands 
far forward of the center. The inaccuracy in the task model that was successfully 
learned on the second hit differs from the inaccuracy in the task model that exists 
for balls landing forward or backward of the paddle center. The aim must be 
learned based on the location the ball lands on the paddle. 
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3.5 Learning the Successive Hits 

In this section, we apply task-level learning past the first and second hits of a jug- 
gling sequence. We first analyze the performance errors of the juggling system in 
order to understand why the robot only averages eight hits. We then describe how 
to apply task-level learning in order to further improve the juggling performance 
of the robot system. In doing so, we address the issues of generalization and task 
state in the context of the juggling system. Finally, we describe the performance 
of the juggling robot, which eventually averages 25 hits after task-level learning 
is applied. 

3.5.1 Examining Performance Errors 

We begin by examining the performance errors that occur during juggling. We 
analyze data from the 20 juggling trials which were described in Section 3.4.2. 
Our goal is to understand whether the characteristics of each hit indicate what 
performance errors are likely to occur. 

We define three performance errors that correspond to the performance goals 
of the system. The performance errors are x errors, y errors, and z errors. Each 
error is the measured distance between the target vector (0.0,0.0, 1.0) and the hit 
vector. The error is always measured to the target vector of (0.0, 0.0, 1.0) because 
the juggling task specifies that the ball should always be hit to the height of 1.0 m 
and to the center of the paddle. 

The performance errors of the juggling robot can be understood based on the 
characteristics of each hit. In order to interpret the error for each of the 160 jug- 
gling hits, we examine each performance error versus the x and y paddle location 
from which the ball was hit. The performance errors could also be analyzed with 
respect to the x, y, and z velocity of the ball at the location from which it was 
hit. However, we chose to simplify our analysis from the start, using the state 
variables corresponding only to location of the ball, not velocity. In addition, a 
statistical analysis suggests that performance errors correlate most strongly to the 
x and y paddle location from which a ball is hit. 

Since the performance data is intrinsically three-dimensional, we have used two 
methods to present the data. First, we display three error grids in Figure 3.13 
that correspond to the the x, y, and z performance errors of the juggling system. 
The paddle is divided up into a 4 x 4 grid, with a grid resolution at 0.12 m. Each 
grid is labeled with the x and y axes that correspond to the position from which 
the ball was hit. The grid axes are the same as the x, y axes of the robot's paddle. 
The performance errors of all the hits that are made from one grid section are 
averaged together. The average error is displayed in the appropriate section in 
Figure 3.13. The grid is thus composed of average performance errors based on 
paddle hit location. The number of hits that occurred in each section is displayed 
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Figure 3.14: X Performance Errors 

in the bottom-right grid of Figure 3.13. 

Each grid provides an indication of the kind of errors that occur in the juggling 
system. The x grid suggests that the ball is hit too far to the left (a negative error), 
except when it is hit from the center of the right edge. The y grid suggests that 
the ball is consistently hit too far forward when it is hit from the lower edge of 
the paddle. The z grid indicates that the ball is consistently hit too low when hit 
from the left side and too high when hit from the right side of the paddle. The 
ellipsis in a grid box indicate that the ball was never hit from that region of the 
paddle. 

Another way to understand the performance errors is to graph each error as 
a function of the paddle location from which the ball was hit. The x, y, and 
z performance errors are each plotted against the x and y paddle location from 
which the the ball was hit. The plots are shown in Figures 3.14, 3.15, and 3.16. 
The plots of y performance error show a parabolic correlation between the y paddle 
location and y errors. The two plots of z performance errors show a correlation 
between both x and y paddle location and the z errors. 

The data from both the plots and grids offer insight into the performance of 
the jugging robot. First, the data has somewhat of a predictive nature. Based on 
the x and y location from which a ball is hit, the data predicts the performance 
error that is likely to result. Still, some performance errors do not correlate at all 
with paddle location. For instance, the x paddle location correlates very poorly to 
x performance errors. Second, the data coincides with some intuitive observations 
of the juggling robot. For example, the large positive errors in y performance 
that occur when a ball is hit from the lower edge of the paddle suggests that the 
robot overcompensates on these hits. This conclusion is similar to the qualitative 
observation made earlier that balls tend to oscillate front-to-back until they are 
out of reach. Third, the robot system is clearly noisy. Balls that are hit from 
the same location are often hit to different points. Correlations do exist between 
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paddle location and performance errors, but only in the presence of significant 
noise. 

3.5.2 What the Errors Mean 

In this subsection, we try to understand what the performance errors mean in 
the context of task-level learning. In other words, we try to answer the question: 
what can the juggling robot learn? 

The data shows the performance errors that occur when a ball is hit from a 
particular location on the paddle. The data is similar to the errors that were 
corrected when the first and second hits were learned. In those cases, an x, j/, and 
z performance error occurred when the robot used the target aim of (0.0,0.0, 1.0) 
to hit the ball. That performance error was dependent on the ball being hit from 
the x, y = (0.0, 0.0) location on the paddle. Our new data provides an indication 
of the performance error of the system for a ball hit from any other location on 
the paddle. 

The performance error at each paddle location can be used to correct the aim 
for a hit from that particular paddle location. When the first and second hit were 
learned, the performance error was used to modify the aim. Now that we have 
data for hits from different paddle locations, we can correct the aim for many 
different hits. The aim for a particular hit can be corrected with Equation 2.10 
by using the performance error that occurred when a ball was previously hit from 
that location. To include this notion of state, Equation 2.10 can be rewritten 

aim n+ i(a;,y) = aim n (x,y) - (hit n (x,y) - target) (3.20) 

The script variables x and y denote the state of the system — the location on the 
paddle from which a ball is hit. Thus, a correction to the aim is calculated based 
on the state of the juggling system. 

The notion that the aim is dependent on the state of the system — where the 
ball will be hit from — deserves some explanation. The implication is that the 
error in the task model is not uniform throughout the workspace of the juggling 
robot. Instead, the error explicitly depends on where the ball is be hit from — the 
x and y state of the system. 

3.5.3 Applying State-Based, Task-Level Learning 

In this subsection, we describe how to implement task-level learning on the jug- 
gling system. We begin by describing a table-based method of task-level learning 
that is based on the error grids described above. We then describe two function- 
based methods in which functions are fit to the performance error data. Finally, 
we directly address the question of how to generalize past experience when apply- 
ing task- level learning. 
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Our first experiments involved a table-based method of task-level learning. We 
began by only trying to reduce the y performance error of the juggling system. 
Our motivation was that the ball usually oscillated front-to-back until it was out 
of reach. To reduce the y errors, the y performance error grid was used. Based 
on the position from where the ball was to be hit, the y error grid shown in 
Figure 3.13 was used to adjust the aim. The adjustments in aim were made using 
Equation 3.20. Since only the y aim was modified, Equation 3.20 was decomposed 
into component equations 

aim£ +1 = aim£ (3.21) 

ain^+i^.y) = aim£ - (hit-grid y (x, y) - target y ) (3.22) 

airn^ = airr£ (3.23) 

Note that the x, y, z superscripts denote the component of the aim. The script 
x and y denote two state variables of the system — the x, y location from which 
the ball is hit. Only the aim in the y direction, aim y , was adjusted in these 
experiments. Ten juggling sequences were performed with the robot using this 
correction to the aim. Unfortunately, the average performance of the robot did 
not improve. On average, eight hits were performed both before and after the 
learning was applied. 

The table- based method was unsuccessful for two reasons. First, the coarse 
resolution of the table masks the character of the inaccuracies in the task model. 
Averaging the performance errors in each 0.12 m square grid box obscures the 
variations in performance error. One way avoid this problem is to increase the 
resolution of the grid, but that implies doing many more juggling trials to record 
enough observations. As the data indicate, even at this coarse resolution, the 
lower-left grid box contains no observations. Second, system noise is an issue 
because the observations that make up the grid are not uniformly distributed. 
As Figure 3.13 shows, the data is concentrated along the two center columns of 
the grid. Several of the other averages in the grid are in fact based on only one 
observation. Any noise in those observations would seriously affect the learning 
process. 

Our second set of experiments involved fitting a planar function to the per- 
formance error observations. Again, we initially worked with the y performance 
errors. A plane was fit to the errors that occurred when a ball was hit from loca- 
tion x,y on the paddle. (A plane was fit to the raw data shown in Figure 3.15, 
not the cumulative averages of the y grid in Figure 3.13.) For these learning 
experiments, Equation 3.20 became 

aim£ +1 = aim* (3.24) 

aimj +1 (a?,y) = aimy - (hit-functionary) - target y ) (3.25) 

airr£ +1 = airr£ (3.26) 
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Based on the planar fit to the data, the term hit-function£(x,i/) was set to 

hit-function£(x,y) = 0.062 + 0.243 • x + 1.042 • y (3.27) 

Note the large dependence of the y performance error on the y location from 
which the ball was hit. We performed 15 juggling trials with this aim correction 
equation. The average number of hits rose from eight to ten. 

With this encouraging result, we fit a second plane to the new performance 
errors that occurred during these juggling trials. We effectively decided to iterate 
using planar functions that describe the performance errors. Since the new per- 
formance errors occurred with the new aims, the new planar function was added 
or superimposed onto the first one. As a result, Equation 3.26 was reused, but 
Equation 3.27 became 

hit-function^x, y) = 0.062 + 0.243 • x + 1.042 • y 

+ 0.046 + 0.002 • x - 0.228 • y (3.28) 

Together, these planes describe the performance error of the juggling robot. With 
these refined corrections, ten juggling sequences were performed. Unfortunately, 
the average performance fell from eight hits to six. This second state-based learn- 
ing iteration degraded system performance. 

The problem with this function-based approach was in the planar functions 
chosen to describe the state-dependent performance errors. The problem can 
be examined from several viewpoints. First, planar functions did not accurately 
describe the performance errors from a qualitative standpoint. The data in Fig- 
ure 3.15 suggests a parabolic relation between y performance error and the y 
location from which the ball was hit. Second, from a statistical perspective, pla- 
nar fits did not accurately describe the data. A planar fit to the data has a fitting 
coefficient of 32%, while a second order polynomial fit has a fitting coefficient of 
approximately 60%. Third, an observation of the robot's performance suggests 
that the second hit consistently propelled the ball too far back. The ball was 
hit backwards because of an inaccurate correction to the aim. For the second 
hit, the ball generally falls near the center of the paddle (x,y = 0.0,0.0), and 
hit-function£(x,y) reduces to 0.108 m. This constant is applied as a correction to 
the aim, making the robot aim 0.108 m further back. This correction directly con- 
tradicts the successful learning experiments for the second hit in which balls are 
hit from (x,y = 0.0,0.0). In those experiments, an aim was successfully learned, 
suggesting that the constant term in the hit-function should be close to 0.0 m. We 
thus conclude that planar fits obscure the second order nature of the performance 
error. 

Our third set of experiments involved a function-based approach that fits sec- 
ond order polynomials to the performance error. Our motivation is both quali- 
tative and statistical. From a qualitative point of view, the data shown in Fig- 
ure 3.15 exhibits a parabolic nature. Statistically, a second order polynomial fit to 
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Figure 3.17: Y Performance Errors After Learning 

the performance data raises the fitting coefficient from 32% to 60%. Once again, 
we concentrated on only learning the y performance errors. Equation 3.26 served 
as the basis for learning, and Equation 3.27 became 

hit-function^ (x, y) = -0.028 + 0.188 • x - 1.020 • y 

- 1.812 • x ■ y + 0.037 • x 2 + 7.517 • y 2 (3.29) 

Based on this function-based task-level learning approach, the robot successfully 
hit the ball an average of 25 times over ten trials. The low was 12 hits and the 
high of 44 hits (a software limit at the time) was reached three times. 

The successful juggling trials can be analyzed from several perspectives. First, 
the new errors in performance can be plotted against the location from which 
the ball was hit. Figure 3.17 shows the y performance errors that resulted once 
learning was applied. These two graphs can be compared directly to the plots in 
Figure 3.15 that describe the juggling performance errors before learning is ap- 
plied. Second, a polynomial can be fit to the new data to identify any remaining 
structure. A fit was performed, but the fitting coefficient of 3% indicates that the 
location from which the ball was hit no longer explains the performance errors. 
Furthermore, the standard deviation of the y performance errors is 0.10 m, sug- 
gesting that the errors have been nearly reduced to the level of noise in the system 
(a y = 0.07 m). 

In our final set of experiments, state-based task-level learning was applied to 
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eliminate all task errors — along x, j/, and z axes. These experiments are similar to 
those in which the y performance error was fit by a second order polynomial. Now, 
the x and z errors are also fit by second order polynomials, and each function is 
simultaneously used to improve juggling performance. The equations that correct 
for the task errors are rewritten as 

aim£ +1 (x,3/) = aim£ — (hit-functionary) — target") (3.30) 

aim£ +1 (a:,2/) = aim£ - (hit-function£(x,y) - target y ) (3.31) 

aim£ +1 (:E,y) = aim£ - (hit-functionJ;(a:,t/) - target 2 ) (3.32) 

The corresponding functions are obtained by fitting data to the x, y, and z per- 
formance errors 

hit-function*(x, y) = -0.029 + 0.037 • x - 0.426 • y 

- 0.772 -x-y + 1.385 • x 2 - 0.603 • y 2 (3.33) 
hit-function£(x, y) = -0.028 + 0.188 • x - 1.020 • y 

- 1.812 -x-y + 0.037 • x 2 + 7.517 • y 2 (3.34) 
hit-function 2 (x,y) = 0.036 - 0.534 • x + 0.212 • y 

- 1.648 • x ■ y - 2.344 • x 2 - 2.886 • y 2 (3.35) 

The fitting coefficients for these functions were 24%, 60%, and 76%, respectively. 
The small fitting coefficient for errors in the x direction indicate that the location 
from which the ball was hit is not a good predictor. The experimental results 
were fairly good. The jugging robot averaged 21 hits, with a low of 7 and a high 
(just once) of 44. From a qualitative standpoint, the robot seemed a bit more 
erratic. The erratic behavior and the smaller average number of hits can partially 
be explained by the use of a polynomial to correct the x aim. Corrections based 
on such a poor fit could have caused unstable performance. 

The successful results of these experiments can be analyzed both graphically 
and statistically. The performance errors that occurred can be graphed as a 
function of the location from which the ball was hit. The errors that occurred 
after the task-level learning was applied are shown in Figures 3.18, 3.19, and 
3.20. These graphs can be compared directly to those of Figures 3.14, 3.15, 
and 3.16. To check whether any second order structure remained in the data, 
second order polynomials were fit to each performance error. Fitting coefficients 
for the polynomial were all under 25%. The implication is that the position 
from which the ball was hit no longer predicts performance error. In addition, 
the standard deviations of the hit errors are cr x = 0.07 m, a y = 0.10 m, and 
a z = 0.03 m, close to the noise level of the system. 

Each of the three approaches described above — table-based, planar-function- 
based, and parabolic-function-based — are efforts at generalizing task-level infor- 
mation. In each case, the method provides a correction to the aim for a ball that 
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is hit anywhere on the paddle. The basis for this correction is a series of hits, 
or experiences, that occurred at discrete points on the paddle. These 160 experi- 
ences are generalized to a hit from any location on the paddle. In the table-based 
method, the observations were generalized by dividing the paddle into a grid, and 
averaging all observations in each section. In the function-based methods, a pla- 
nar or polynomial function is fit to the data points. In both methods, the 160 
experiences are generalized to corrections in the x, y, and z aim. 

These three approaches provide a convenient way of dealing with noise during 
the generalization process. In the table-based method, observations are averaged, 
reducing the effect of noise in the system. In the function-based methods, a least- 
squares procedure is used to find a surface that best describes the observations. In 
both approaches, the effect of noise is largely reduced, allowing task-level learning 
to improve system performance. 



3.6 Discussion 

We have demonstrated that an extended form of the task-level learning algorithms 
successfully improves the performance of a juggling robot. The juggling system 
practiced the task, monitored its own performance, and adjusted its aim to better 
perform the task. Overall, we note three major contributions. 

The first contribution of this chapter is that the performance of a complex 
juggling system can be improved with task- level learning. A task model of juggling 
formed the basis for describing and improving the robot system's performance. 
The task-level algorithms developed in Chapter 2 were extended so that they could 
be applied to a complex, multi-dimensional juggling task. The task-level learning 
algorithms were further extended to take into account some of the state variables 
of the juggling system. This extended form of the task-level learning algorithms 
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improved the performance of the juggling system from an average of 8 hits to an 
average of 25 consecutive hits. 

The second contribution is that generalizing past experience is fundamental 
to improving the performance of a robot system. Generalization is useful because 
past experience provides an indication of future robot performance. In the case 
°f jugging, performance errors are generalized on the basis of the location from 
which the ball is hit. Many experiences are necessary because errors in the task 
model are distributed non-uniformly over the robot's workspace. A robot system 
can learn from past experience, using task-level learning to correct for the perfor- 
mance errors that are likely to occur. For the juggling system, generalizing past 
experience was instrumental in improving performance from an average of 8 hits 
to an average of 25 consecutive hits. 

The third contribution is to identify the need for a sequence of training steps 
that improve the performance of a complex task. In this case, the training steps 
correspond to three subtasks: the first juggling hit, the second juggling hit, and the 
successive juggling hits. The performance of each subtask in the training sequence 
can be improved by task-level learning. So far, the form of the training sequence 
is chosen by the researchers based on an intuitive understanding of the juggling 
task. The process of choosing a particular set of training steps is an avenue for 
further research. Our conclusion is that a training sequence is important for the 
juggling task, and may be necessary for other complex tasks. 

Finally, we want to answer the question: what has the juggling robot learned? 
The robot system has learned what aim to use in order to hit a ball back to the 
center of the paddle. In fact, the robot has learned a number of different aims 
that depend on the location from which the ball will be hit. The variation in these 
aims is based on how accurately the task model describes the juggling system for 
different hits. 



Chapter 4 

Other Ways to Improve Robot 
Performance 



In this chapter, we discuss other approaches that researchers have used to get 
robots to perform a variety of tasks. In order to do so, we survey several robot 
systems that have been built in recent years. We focus on the approach that each 
research group has chosen to improve system performance. 

We broadly classify the surveyed research into two categories: calibration and 
non-calibration research. First, we discuss calibration research which we further 
divide into two phases. Component model calibration emphasizes the precise iden- 
tification of the parameters of the component kinematics, dynamics, and visual 
models of a robot system. System calibration involves aligning the component 
models of a robot system in order to control the entire system. Second, we dis- 
cuss approaches that researchers have taken when calibration has not lead to 
successful task performance. These non-calibration approaches include feedback 
control and other iterative schemes. 

To put task-level learning in perspective, we include a section that briefly 
compares calibration and learning. 

To provide some background, we briefly survey recent work in trajectory learn- 
ing that proposes practice as a means of improving performance. 

4.1 Calibration Approaches 

One way to increase robot functionality is to control the robot using better mod- 
els. A good deal of research in robotics has thus been directed towards accurately 
modeling the component modules of robot systems — the kinematics, dynamics, 
actuation, and sensing components. These models are calibrated to correctly pre- 
dict the response of the component systems of the robot. Once these component 
models are calibrated, they must be aligned with respect to one another. This 
process, which we call system calibration, is an integral part of building a robot 
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system. After calibrating both the component models and the system as a whole, 
the robot system can often be commanded to perform a wider range of tasks. 

In this section, we examine calibration research for some clues on how the 
performance of robot systems can be improved. The literature is extensive in this 
field, and we do not survey all recent work. Instead, we try to examine some rep- 
resentative robot systems [Andersson 1988; Beni, Hackwood, and Trimmer 1984; 
Clocksin et al. 1985; Gershon and Porat 1988; Ikeuchi et al. 1986; Inoue and 
Inaba 1984; King et al. 1988; Liebes et al. 1988; Lozano-Perez et al. 1987; Luo, 
Mullen, and Wessell 1988; Roth and O'Hara 1987; Skaar, Brockman, and Han- 
son 1987; Taylor, Hollis, and Lavin 1985; Whitney 1987] in order to understand 
the issues relevant to robot learning. The first subsection is devoted towards 
identifying the basic issues in calibrating the component models of a robot. The 
second subsection serves to identify the calibration issues that arise when a com- 
plete robot system is assembled. 

4.1.1 Component Model Calibration 

Much recent work has concentrated on accurately identifying and calibrating the 
dynamic, kinematic, and sensing models of robots. We briefly touch upon research 
in the areas of robot dynamics and kinematics in order to point out some of the 
important issues involved. For a more detailed investigation of the field, we refer 
the reader to a survey of the literature [Hollerbach 1988]. 

The first issue in model calibration is to choose an accurate structural model 
of the system. A rich and accurate model is necessary for each component of 
the robot system. For example, in the area of robot kinematics, the Denavit- 
Hartenberg representation of a serial link manipulator is often chosen [Denavit 
and Hartenberg 1955]. This representation makes certain assumptions about the 
structure of the system. The drive train is assumed to be free from backlash, 
compliance, and gear transmission error. In addition, nearly-parallel neighboring 
joint axes make the kinematic parameters extremely sensitive to measurement 
error. When the parameters of this structural model are accurately estimated, a 
robot can be controlled to accuracies of approximately 0.5-1.0 mm [Hayati and 
Roston 1986]. 

To improve the positioning accuracy of a robot, the structure of the kinematic 
model needs to be extended. These extensions routinely attempt to accurately 
model the features that the original Denavit-Hartenberg structure neglects — 
nearly parallel joint axes, joint backlash and compliance, and gear transmission 
error. When the robot kinematic model is extended to include these effects, a 
robot can be controlled to accuracies of up to 0.2-0.3 mm [Chen and Chao 1986; 
Whitney, Lozinski, and Rourke 1986]. 

The second major issue in model calibration is estimating the parameters of the 
model. This parameter estimation procedure tends to vary based on the particular 
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model of the component system. The essential feature is to operate the system 
over the workspace of interest, and to find parameters of the model that maximize 
the model's predictive accuracy. For example, in robot dynamics calibration, 
parameters corresponding to masses and inertias of the robot links are estimated 
based on the Newton-Euler model of rigid body dynamics. Parameter estimation 
begins by moving the manipulator through its workspace. Based on estimates 
of arm acceleration and corresponding motor torques, the model parameters are 
estimated [Atkeson, An, and Hollerbach 1986; Mayeda, Osuka, and Kangawa 1984; 
Mukerjee 1984; Neuman and Khosla 1985; Olsen and Bekey 1985]. 

These two major issues must be faced when a component model of a robot 
system is calibrated. A model structure must be chosen and then the model 
parameters must be estimated. For each model, a different structure is identified 
that accurately describes the system and the resulting parameters are determined. 
Models of cameras, lasers, and other sensors are also calibrated in this fashion. 

4.1.2 Complete System Calibration 

Once the component models of a system are calibrated, they must be combined 
to describe the entire robot system. During this system calibration phase, several 
major issues arise. First, the robot system can be calibrated using one of two 
approaches. Either the coordinate frames of component models are aligned or one 
component system is used to calibrate another. Second, each component model is 
as important as the component system's effect on total system performance. As 
a result, different component systems are modeled to different accuracies. Third, 
models are often calibrated in the vicinity of the task to eliminate the need for 
more precise models. In this fashion the model is "tuned" to a particular area of 
the workspace. Fourth, the final test of the effectiveness of system calibration is 
whether the robot can be commanded to successfully perform the task. 

The coordinate frames of component models are often aligned with respect 
to one another. Alignment allows information from one model to be properly 
used by another. This system calibration approach was necessary in several robot 
systems. In the Handey robot system [Lozano-Perez et al. 1987], the range sensor 
world coordinate frame, the solid modeler world frame, and the robot kinematics 
world frame needed to be aligned. A significant amount of effort was put into 
calibrating these component models before the system could function properly. In 
a parts acquisition robot system [Roth and O'Hara 1987], the Cartesian frame of 
the range sensor needed to be aligned to the Cartesian frame of the robot tool. 
This calibration step was necessary because the sensor was mounted directly on 
the robot. A similar model alignment situation arose in an arc welding robot 
system [Clocksin et al. 1985] where a laser range sensor was mounted directly on 
the tool of the robot. 

Some researchers have avoided this coordinate frame alignment problem by 
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using one component system to calibrate another. In a system developed by Inoue 
and Inaba [1984], the stereo vision model is calibrated by moving the robot tip in 
the field of view of the cameras. Calibration of the stereo pair is based on estimates 
of robot position obtained from the robot kinematic model. Researchers have 
recently developed elaborate calibration procedures for this situation, terming the 
problem hand-eye calibration [Tsai and Lenz 1987; Shiu and Ahmad 1987]. The 
essential feature is to use the kinematic model to provide Cartesian coordinates to 
the sensor system. In a different context, the stereo vision model of a mobile robot 
[Brooks, Flynn, and Marill 1987] is calibrated relative to the forward motion vision 
system. Forward velocity, the output from the motion vision model, is used as the 
calibration input to the stereo vision system. In each of these cases, calibration 
of the component model is only as accurate as that of the component model 
used to provide calibration information. For example, in the hand-eye calibration 
schemes, vision calibration is only as accurate as the robot kinematic model. 

A second major calibration issue is the relative accuracy of each component 
model in the robot system. Researchers calibrate a component model relative 
to the component system's importance in performing the task. In the case of a 
ping-pong playing robot system [Andersson 1988], significant effort was placed in 
calibrating the vision system while other component systems were only modeled 
in a rudimentary way. Two separate pairs of stereo cameras were used to im- 
prove the accuracy of the stereo ranging model and camera timing characteristics 
were carefully modeled. Emphasis was placed on accurately modeling component 
systems that were critical to performing the ping-pong task. In a printed-circuit 
board assembly system [Liebes et al. 1988], researchers were forced to improve 
both kinematic and camera component models. A table memory was used to 
model the transformation between Cartesian space and joint space because the 
standard Denavit-Hartenberg model was too inaccurate for the task. Another ta- 
ble memory was used to model the effects of image distortion caused by the lens 
and imager of an inexpensive camera. Only after both component systems were 
modeled more accurately could the robot perform the assembly task. In a parts 
acquisition system [Roth and O'Hara 1987], the accuracy of a laser range sensor 
model prevented the robot from acquiring the part successfully. Researchers im- 
plemented a table memory to accurately model the effect of sensor nonlinearities. 
In each of these robot systems, the accuracy of a particular component model was 
improved before the entire system could function properly. 

A third major issue is that system calibration is usually performed in the vicin- 
ity of the task. While structural models theoretically describe the system over its 
entire operating range, in practice they do not. As a result, models are calibrated 
near the area in which the system will perform the task. For vision models, cal- 
ibration fixtures are often placed near the area in which the robot will operate 
[Tsai and Lenz 1987]. Some researchers have exploited the concept of calibrating 
a robot in the area where the task is to be performed. In a real-time ball-catching 
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system [Skaar, Brockman, and Hanson 1987], the model describing the relation 
between camera coordinates and the robot arm is estimated continuously. As 
the ball gets closer to the robot's cup, new estimates of model parameters are 
weighted more heavily. Other researchers have implemented a similar approach 
in the area of hand-eye coordination [Liang, Lee, and Hackwood 1988], terming 
the procedure dynamic self-calibration. 

The fourth major issue is that calibration is complete when a robot system 
can successfully achieve the task. At this point, the component models describe 
the robot system "accurately enough" and no learning is necessary to improve 
the performance of the task. With this thought in mind, no further calibration 
is necessary in a variety of recent systems appearing in the literature, including 
a vision-based grasping system [Ikeuchi et al. 1986], a parts acquisition system 
[Roth and O'Hara 1987], a manipulation system [Lozano-Perez et al. 1987], and 
a circuit-board assembly system [Liebes et al. 1988]. Other systems that have 
been described in the literature are ripe for additional calibration or learning 
schemes if excellent performance is required. These systems, which do not always 
successfully perform the desired task, include a conveyor tracking system with an 
87% success rate [Luo, Mullen, and Wessell 1988], a ball-catching system which 
succeeds 80% of the time [Skaar, Brockman, and Hanson 1987], and a robot ping- 
pong system [Andersson 1988]. 

4.2 Non- Calibration Approaches 

In this section, we discuss methods that researchers have taken when their cali- 
brated robot systems were unable to perform the desired task. We describe some 
feedback control techniques as well as iterative approaches that have been used by 
researchers to improve system performance. We also examine why tasks should 
be denned in sensor coordinates, explaining how this approach improves system 
accuracy. 

4.2.1 Feedback Control 

Feedback control is one technique for improving the performance of a robot sys- 
tem. In feedback control, commands to the robot are modified based on errors in 
performance. The most common robot application implemented in terms of feed- 
back control is the task of visually servoing a robot to a desired position. Weiss 
[1984] formally analyzes and classifies feedback control approaches to the visual 
servoing task. The simplest approach, termed "static look and move" involves 
commanding the robot in world coordinates, visually estimating the task error 
in world coordinates, and updating the world coordinate command to the robot. 
The approach is termed "static" because each step is performed sequentially. 
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A second feedback control approach, termed "dynamic look and move" by 
Weiss, is similar to the static approach, except that each step is processed in par- 
allel. The "dynamic look and move" approach has been used by several different 
robot researchers to improve system performance. In a robotic sewing system 
[Gershon and Porat 1988], two dynamic servo controllers are used to maintain 
cloth tension and to produce constant seam width. For each servo, a sensor mea- 
sures the task error, and a controller with a simple model of the task updates 
the commands to the robot. In an arc welding system [Clocksin et al. 1985], the 
feedback controller compares sensor measurements of task error with previously 
defined sensor readings. In the event that the current sensor measurements differ 
from the previously "taught" measurements, commands to the robot are modified 
to reduce welding errors. 

In both uses of feedback control, the issue of model-based command modifi- 
cations is addressed. First, task errors are transformed to robot command mod- 
ifications using models that relate errors to commands. In each robot system, 
the model is a very simple one, involving only a feedback control gain. In the 
case of robotic sewing, a gain transforms sensed errors in seam width to angular 
commands to the robot. In the arc welding robot, the transformation from task 
errors to robot Cartesian commands is also based on a gain. Weiss suggests that 
for best system performance, an accurate model of the robot system should relate 
task errors to command modifications. 

Neither of these robot controllers addresses the issue of performing a task. 
First, no framework in which to model and successfully perform a the robot task 
is described. In fact, only simplied models involving one-dimensional feedback 
gains are used to correct task errors. In addition, the researchers are selectively 
choosing which command variables to modify and which errors to sense, based on 
experience and intuition. Second, no notion of a sequence of steps is involved in 
correcting task errors. The feedback is always state dependent, and ignores any 
need for a sequence of robot commands that will eliminate the task error. In a 
more straightforward approach to accomplishing a task, Whitney [1987] outlines 
a process that links low-level robot commands to task performance based on the 
system model. He applies this process to a robotic grinding system. After each 
grinding pass, the system measures grinding errors and issues commands to the 
robot based on a model of the grinding task. 

4.2.2 Iterative Techniques 

Several researchers have implemented iterative techniques that reduce errors in 
task performance. We separate these techniques from feedback control because 
they use a more detailed model of the task. These techniques involve measuring 
the error in sensor coordinates, transforming this error into world coordinates, 
and commanding a robot motion that eliminates the error. 
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These iterative schemes are applied to the task of visually-guiding a robot to a 
desired position. In an assembly robot [King et al. 1988], a one-step technique was 
used to improve robot performance beyond the 1.5 mm accuracy achieved during 
calibration. The robot is first commanded to move a screw above a hole, while a 
vision system estimates the resulting Cartesian error. The robot system adds this 
error to the commanded robot position, improving the positioning performance 
of the system and successfully inserting the screw into the hole. In the robot sys- 
tem designed for high- precision inspection [Beni, Hackwood, and Trimmer 1984], 
a similar iterative scheme is applied. Based on the Cartesian positioning error 
estimated by one camera, an offset is added to the Cartesian robot command. 
After the robot is commanded to move, a second, high-precision camera estimates 
the new Cartesian error. Based on this error, a second offset is added to the 
robot command. This two-step scheme improves robot precision from 2.0 mm to 
0.05 mm. Both these approaches are model- based in the sense that the inverse 
model of the system is used to transform errors in performance into command 
corrections. 

Other researchers have implemented similar techniques that iterate until a 
desired level of precision is achieved. In a robot system designed for precise 
manipulation [Taylor, Hollis, and Lavin 1985], parts misalignment is reduced to 
less than 0.01 mm. The iterative steps include sensing the error in camera space, 
transforming the error into Cartesian coordinates, and adding an offset to the 
robot command. The iterative algorithm sequences through these three steps until 
the error is reduced to 0.01 mm. With this iterative approach, the positioning 
error of the robot system can be reduced to the relative precision of the sensor. 
This iterative scheme can be considered multi-dimensional since errors in the 
x, y, and directions are transformed into robot command corrections. The 
command correction step is also model-based since the error in camera coordinates 
is transformed to joint space based on models of the sensor and robot. 

These iterative techniques resemble our task-level learning algorithms. The 
techniques use a model of the visually-guided robot to transform sensor coordi- 
nates to Cartesian coordinates. For the task of visually-guiding a robot, task 
coordinates coincide with Cartesian coordinates. Based on the Cartesian error 
in performance, the Cartesian commands to the robot are modified. We have 
developed and formalized similar ideas, and are applying them to complex, multi- 
dimensional, dynamic tasks. 

4.2.3 Defining the Task in Sensor Space 

When iterative, feedback, or learning techniques are used, the task is usually 
defined in sensor coordinates. Denning the task in the sensor frame allows the task 
to be performed to the resolution of the sensor [King et al. 1988; Taylor, Hollis, 
and Lavin 1985]. As the sensor resolution is increased, the task can be performed 



CHAPTER 4. OTHER WAYS TO IMPROVE ROBOT PERFORMANCE 65 

more accurately until the actuator resolution is exceeded [Beni, Hackwood, and 
Trimmer 1984]. 

System performance can be improved when a task is defined directly in sen- 
sor coordinates. The sensor can then be used to measure task performance. If, 
instead, a task is defined in a coordinate frame that is related to the sensor by a 
model, the task will only be performed to the accuracy of the model. The error 
in the robot system will be equal to the calibration error between the model and 
the actual coordinate frame transformation. Better system performance can be 
achieved when a sensor directly defines (or measures) task performance. 

Several researchers have defined tasks in sensor coordinates when their robot 
systems used sensory information. Defining the task in this manner prevented 
the inaccuracies of one model — the sensor model — from degrading system perfor- 
mance. Luo, Mullen, and Wessell [1988] defined a conveyor tracking and part 
interception task in camera coordinates. Skaar, Brockman, and Hanson [1987] 
defined a ball-catching task directly in camera space. Inoue and Inaba [1984] de- 
fined a rope- into- ring task and a knot-tying task directly in camera space. Clocksin 
et al. [1985] described how an arc welding robot system was "taught" a sequence 
of correct sensor readings by running the system on a prototype fixture. 

4.2.4 Strategy Modifications 

Another method of improving system performance is to modify the strategy that 
the system uses. While a full discussion of task strategies is beyond the scope 
of this thesis, we can appeal to intuitive notions of strategies to describe how 
some systems successfully perform tasks. In a parts acquisition system [Roth 
and O'Hara 1987], grasp locations are chosen based on how reliably they can be 
described by the sensor. Grasp point locations that are sensitive to sensor error 
are discarded. The strategy of which grasp points to use is modified in order to 
successfully perform the "parts acquisition" task. In a robot ping-pong player 
[Andersson 1988], strategy modification is an integral portion of the system. The 
plan for how to hit and where to hit the ball is modified by expert "tuners." Each 
tuner, operating in its domain of expertise, estimates the potential success of the 
planned ping-pong hit. If the system estimates that the hit will be unsuccessful, 
the "tuner" modifies the hitting strategy. 

4.3 Calibration and Learning 

In this section, we compare learning and calibration as methods of improving 
system performance. We begin by analyzing the advantages of accurate modeling 
and the difficulties encountered. We discuss the question: how well should a 
system be modeled and calibrated? We suggest that calibration and learning are 
complementary approaches to improving task performance. 
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Improving the performance of a system by accurate modeling and calibration 
has several advantages. When the structure of the model is chosen correctly and 
the parameters are estimated correctly, the model is valid for any inputs and 
outputs. The experience gained in selected trials generalizes to the entire range 
of operation. As a result, learning does not need to take place every time the 
model is used. Additionally, a structured model provides a compact method of 
representing the input/output behavior of a system. Data need not be stored for 
every potential scenario, but the model can instead be evaluated when necessary. 

Attempts at accurate modeling and extensive calibration have several short- 
comings. First, the number and range of robot motions necessary to fully estimate 
the model parameters is often large. Making trial motions that actually attempt 
the task may be a more efficient method of reaching the task goal. Second, no 
matter how well the parameters are estimated, the models are often based on 
structural assumptions that are not valid in practice. The Newton-Euler model 
of dynamics, for example, is based on rigid body dynamics which typically is a 
good but not perfect description of robots. Compensating for the structural as- 
sumptions of the model requires a great deal of data, time, and ingenuity. Even 
after compensation some structural modeling errors will probably remain. 

Since accurate modeling and calibration are sometimes difficult, we ask the 
question: how well should a system be modeled? A tradeoff exists between the 
time and energy spent in accurate modeling and the desire for better system 
performance. One answer is that a system should be modeled well enough for the 
task to be achieved. This answer is often chosen when robot systems are calibrated 
until the task can be successfully performed — parts can be grasped, seams can be 
welded, objects can be tracked, and ping-pong can be played. Another answer 
is to model the system accurately enough for learning or iterative schemes to 
be applied successfully. This approach embraces the view that the robot should 
practice a task and learn from experience. 

Finally, we want to suggest that learning and calibration are complementary 
approaches to increasing system performance. With accurate modeling, a robot 
system can successfully perform a larger number of tasks. Accurate modeling 
also increases the likelihood that learning will converge and increases the speed 
of the convergence. With learning, the same tasks can be performed with less 
accurate models of the robot system. Learning is often easier to implement than 
extensive model calibration procedures. The conclusion we reach is that both 
accurate modeling and learning can be used to improve robot performance. 

4.4 Recent Trajectory Learning Research 

In this section, we briefly discuss trajectory learning research. This research fo- 
cused on learning one component model of a robot system — the dynamics model. 
We analyze past work in this area as a base from which to examine task-level 
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learning. 

Robot learning research has focused on the trajectory following subtask [Ari- 
moto et al. 1985; Casalino and Gambardella 1986; Craig 1984; Furuta and Ya- 
makita 1986; Hara, Omata, and Nakano 1985; Harokopos 1986; Mita and Kato 1985; 
Morita 1986, Togai and Yamano 1986; Uchiyama 1978; Wang 1984; Wang and 
Horowitz 1985]. Robots are made to follow a particular trajectory more accu- 
rately as they repeat the movement. Feedforward torque commands for repetitive 
movements are refined on the basis of previous movement errors. This research 
has focused primarily on linear learning operators which often ignore the under- 
lying model of the system. The work has also emphasized the stability and not 
the performance of the proposed algorithms. 

We have begun to explore the advantages of using the inverse model as the 
learning operator and how to apply learning algorithms at the task level. Atke- 
son and Mclntyre [1986] explored fixed-model learning for the trajectory following 
subtask. The research shows that using the inverse Newton-Euler model as the 
learning operator reduces most of the movement errors. The same learning al- 
gorithms have been applied to the task of positioning a robot at a visual target 
[Atkeson et al. 1987]. The theoretical convergence criteria and performance for 
the learning algorithm was then derived. These learning procedures can now be 
extended to dynamic, complex, multi-dimensional tasks — throwing and juggling. 



Chapter 5 

Conclusion and Future Research 



In this chapter, we draw a number of conclusions from our work in task-level 
learning and suggest several avenues for future research. The conclusions and 
suggestions are based on our experience in developing and implementing task- 
level learning on throwing and juggling tasks. 

5.1 Conclusion 

Task-Level Learning Works. Task-level learning can successfully improve a robot's 
performance of complex, multi-dimensional dynamic tasks. The learning algo- 
rithm is based on a simple notion of how a person throwing a ball corrects for 
errors in performance. The task-level aim of the system is modified based on 
errors in task performance. Learning at the task-level improved a robot's perfor- 
mance of both a ball-throwing task and a complex juggling task. Without doubt, 
task-level learning could successfully improve a robot's performance of a number 
of other complex dynamic tasks. 

Learning Can Occur at the Task Level. Learning can be applied at the task 
level to improve the performance of robot systems. Less data is necessary to refine 
task-level commands than the many low-level commands that drive the robot's 
component systems. Learning at the task level also reduces the degrees of freedom 
of the models to be learned. Ultimately, learning at the task- level can be used 
with learning at other levels to simultaneously improve performance. 

Accurate Models Speed Up Learning. Accurate models improve the initial 
performance of a system and speed up the performance improvement with prac- 
tice. The initial performance of the ball-throwing and juggling tasks was possible 
only with accurate models of the robot systems. Likewise, the performance im- 
provement with practice was based on using accurate models of the systems to 
transform errors into command corrections. Learning does not obviate the need 
for accurate models of the task. Instead, learning and accurate modeling are 
complementary methods of improving task performance. 
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Difficulties in Generalizing State- Dependent Errors. One problem with task- 
level learning is the need to generalize task-level errors based on the state of 
the system. In the juggling task, different corrections to the aim were necessary 
depending on the paddle location from which a ball would be hit. The difficulty 
lies in identifying which state variables correlate with the task- level errors. In 
complex systems, the number of state variables to examine is large. However, 
once the important state variables are identified, task-level learning can proceed 
using an extended version of the original learning algorithms. 

The Aim is Learned. Task-level learning improves the performance on a task 
by adjusting the system's aim. The system learns the task-level goal that will 
generate the desired performance of the task. In ball throwing, the system learns 
where to aim the ball so that the ball will land on the target. In juggling, the 
system learns a number of different aims to use in order to hit the ball back to the 
center of the paddle. In each case, the system is learning corrections to the aim 
to compensate for inaccuracies in the component models that describe the task. 

5.2 Future Research 

Generalization. Generalizing experience between similar tasks is an important 
component of learning. The difficulty lies in identifying what to generalize, when 
to generalize, and how much to generalize. Several related juggling tasks could be 
performed with the juggling system to try to answer these questions. The original 
task could be modified to hit the ball to a higher location. The task could entail 
hitting the ball to an x, y location away from the center of the paddle. The task 
could involve hitting a different ball with a different coefficient of restitution. In 
each case, the experience gained in the original juggling task can be used as a 
guide to improving the performance of the modified task. 

Decomposing the Task into Training Steps. The concepts of whether and how 
to decompose a task into training steps are important ones to investigate further. 
In Chapter 3, the juggling task was decomposed into three subtasks based on 
the intuition of the researchers. Instead, a principled method is necessary. Many 
different complex tasks could first be analyzed to understand the potential need 
to decompose a task into subtasks, or training steps. Such an analysis might also 
suggest how to choose the particular set of simpler subtasks. 

Effects of Noise. A detailed statistical analysis of the effects of noise on the 
task-level learning algorithms is necessary. It is important to understand exactly 
how the repeatability of the system affects the speed and stability of the learning 
process. With such an understanding, the learning algorithms can be modified to 
perform successfully in the presence of noise. 

Task Strategies. Adjusting the task strategy is a separate level of learning 
that needs to be investigated further. Up to now, the strategy used to accomplish 
a task has been fixed. For example, in ball throwing, the task strategy involved 
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adjusting the rebate velocity to throw the haB clo ie r or farther. The f ystem never 
used the strategy of modifying the f dean angfe to agatt the distance the ball is 
thrown. We need to andoratsnd how te iJenHrj, a, and choose from 

among the diffareat task st rat e gi e s that cai k. 
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